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ANALYSIS OF GENETIC POLYMORPHISMS AND GENE COPY NUMBER 



STATEMENT OF GOVERNMENT INTEREST 

Research leading to the invention was funded in part by 
NIH grant No. 1R01HG00813-01 , and the government may have 
5 certain rights to the invention. 

BACKGROUND OF THE INVENTION 

FIELD OF THE INVENTION 

The present invention resides in the field of molecular 
genetics and diagnostics. 

10 DESCRIPTION OF RELATED ART 

Virtually all substances introduced into the human body 
(xenobiotics) as well as most endogenous compounds 
(endobiotics) undergo some form of biotransformation in order 
to be eliminated from the body. Many enzymes contribute to 

15 the phase I and phase II metabolic pathways responsible for 
this bioprocessing. Phase I enzymes include reductases, 
oxidases and hydrolases. Among the phase I enzymes are the 
cytochromes P450, a superfamily of hemoproteins involved in 
the oxidative metabolism of steroids, fatty adds, 

2 0 prostaglandins, leukotrienes, biogenic amines, pheromones, 

plant metabolites and chemical carcinogens as well as a large 
number of important drugs (Heim & Meyer, Genomics 14, 49-58 
(1992)). Phase II enzymes are primarily transferases 
responsible for transferring glucuronic acid, sulfate or 

25 glutathione to compounds already processed by phase I enzymes 
(Gonzales & Idle, Clin. Pharmacokinet. 26, 59-70 (1994)). 
Phase II enzymes include epoxide hydrolase, catalase, 
glutathione peroxidase, superoxide dismutase and glutathione 
S-transf erase . 

30 Many drugs are metabolized by biotransformation enzymes. 

For some drugs, metabolism occurs after the drug has exerted 
its desired effect, and result in detoxification of the drug 
and elimination of the drug from the body. Similarly, the 
biotransformation enzymes also have roles in detoxifying 

35 harmful environmental compounds. For other drugs, metabolism 
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is required to convert the drug to an active state before the 
drug can exert its desired effect. 

Genetic polymorphisms of cytochromes P450 and other 
biotransformation enzymes result in phenotypically-distinct 
5 subpopulations that differ in their ability to perform 
biotransformations of particular drugs and other chemical 
compounds. These phenotypic distinctions have important 
implications for selection of drugs. For example, a drug that 
is safe when administered to most human may cause intolerable 

10 side-effects in an individual suffering from a defect in an 
enzyme required for detoxification of the drug. 
Alternatively, a drug that is effective in most humans may be 
ineffective in a particular subpopulation because of lack of a 
enzyme required for conversion of the drug to a metabolically 

15 active form. Further, individuals lacking a biotransformation 
enzyme are often susceptible to cancers from environmental 
chemicals due to inability to detoxify the chemicals. 
Eichelbaum et al., Toxicology Letters 64/65, 155-122 (1992). 
Accordingly, it is important to identify individuals who are 

20 deficient in a particular P450 enzyme, so that drugs known or 
suspected of being metabolized by the enzyme are not used, or 
used only with special precautions (e.g., reduced dosage, 
close monitoring) in such individuals. Identification of such 
individuals is also important so that such individuals can be 

25 subjected to regular monitoring for the onset of cancers. 

Existing methods of identifying deficiencies are not 
entirely satisfactory. Patient metabolic profiles are 
currently assessed with a bioassay after a probe drug 
administration. For example, a poor drug metabolizer with a 

30 CYP2D6 defect is identified by administering one of the probe 
drugs, debrisoquine, sparteine or dextromethorphan, then 
testing urine for the ratio of unmodified to modified drug. 
Poor metabolizers (PM) exhibit physiologic accumulation of 
unmodified drug and have a high metabolic ratio of probe drug 

35 to metabolite. This bioassay has a number of limitations: 

lack of patient cooperation, adverse reactions to probe drugs, 
and inaccuracy due to coadministration of other 
pharmacological agents or disease effects. Genetic assays by 
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RFLP (restriction fragment length polymorphism) , ASO PCR 
(allele specific oligonucleotide hybridization to PCR products 
or PCR using mutant/wildtype specific oligo primers) , SSCP 
(single stranded conformation polymorphism) and TGGE/DGGE 
5 (temperature or denaturing gradient gel electrophoresis) , MDE 
(mutation detection electrophoresis) are time-consuming, 
technically demanding and limited in the number of gene 
mutation sites that can be tested at one time. 

The difficulties inherent in previous methods are 

10 overcome by the use of DNA chips to analyze mutations in 
biotransformation genes. The development of VLSIPS™ 
technology has provided methods for making very large arrays 
of oligonucleotide probes in very small areas. See U.S 
5,143,854, WO 90/15070 and WO 92/10092, each of which is 

15 incorporated herein by reference. 

Microf abricated arrays of large numbers of 
oligonucleotide probes, called "DNA chips" offer great promise 
for a wide variety of applications. The present application 
describes the use of such chips for inter alia analysis of 

20 polymorphisms and copy number variations in genes of interest, 
particularly, biotransformation genes, such as cytochromes 
P450. 



SUMMARY OF THE INVENTION 

The invention provides methods for determining the copy 
25 number of a gene present in an individual. In such methods, a 
plurality of polymorphic sites from an individual are analyzed 
and the number of different polymorphic forms present at each 
site is thereby determined. Gene copy number is then assigned 
as the highest number of polymorphic forms present at a single 
30 site. Typically, the polymorphisms on in the gene whose copy 
number is being determined or in flanking sequences, although 
the polymorphism can be present elsewhere provided they are on 
the same chromosome as the gene whose copy number is being 
determined. To illustrate, if a single polymorphic form is 
35 present at each of the plurality of sites, the copy number of 
the gene is assigned as 1 . If two polymorphic forms are 
present at one site and a single polymorphic form is present 
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at each other of the plurality of sites, the copy number of 
the gene is assigned as 2, If three polymorphic forms are 
present at a first polymorphic site, a single polymorphic form 
is present at a second polymorphic site and two polymorphic 
5 forms are present at a third polymorphic site and the copy 
number of the gene is assigned as 3. 

Often some or all of the polymorphisms analyzed are 
silent polymorphisms. Such silent polymorphisms can be 
present in a noncoding segment of the gene, such as an 

10 intronic segment, or in sequences flanking the gene. The more 
polymorphisms analyzed, the more likely one is to obtain an 
accurate result. Typically, analysis of about 10 or 50 
polymorphisms is sufficient. Nucleic acids for analysis are 
typically prepared by obtaining a tissue sample from the 

15 individual containing the gene and amplifying the gene or a 
fragment thereof . 

Polymorphisms are typically analyzed using probe arrays. 
Such analysis can be performed by contacting a nucleic acid 
comprising the gene or a fragment thereof with an array of 

20 oligonucleotides, the array comprising a plurality of 

subarrays, each subarray spanning a polymorphic site and 
complementarity to at least one polymorphic form of the gene 
at the site. Hybridization intensities of the nucleic acid to 
the oligonucleotides in the array are then detected. The 

25 pattern of hybridization indicates the number of polymorphic 
forms present at each polymorphic site. In some methods, 
subarrays are subdivided into probe groups, with different 
probe groups comprising probes complementary to different 
polymorphic forms at a site. In some methods, probe groups 

30 are subdivided into two or more probe sets. A first probe set 
comprises a plurality of probes spanning a polymorphic site of 
the gene, each probe comprising a segment of at least six 
nucleotides exactly complementary to a polymorphic form of the 
gene at the site, the segment including at least one 

35 interrogation position complementary to a corresponding 
nucleotide in the polymorphic form. A second probe set 
comprises a corresponding probe for each probe in the first 
probe set, the corresponding probe in the second probe set 
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being identical to a sequence comprising the corresponding 
probe from the first probe set or a subsequence of at least 
six nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
5 interrogation position is occupied by a different nucleotide 
in each of the two corresponding probes from the first and 
second probe sets. In some methods, third and fourth probe 
sets are also present. In such methods, the second, third and 
fourth probe sets, each comprise a corresponding probe for 

10 each probe in the first probe set, the probes in the second, 
third and fourth probe sets being identical to a sequence 
comprising the corresponding probe from the first probe set or 
a subsequence of at least six nucleotides thereof that 
includes the at least one interrogation position, except that 

15 the at least one interrogation position is occupied by a 

different nucleotide in each of the four corresponding probes 
from the four probe sets. 

Often, the methods also analyze a phenotype -determining 
polymorphic site in the same gene as the polymorphisms used to 

20 determined copy number to determine which polymorphic form(s) 
are present at the site. This information can be used to 
diagnoses a phenotype of the patient based on the polymorphic 
form(s) present at the phenotype -determining polymorphic site. 
In some methods, analysis of polymorphisms for 

25 determination of copy number and analysis of a phenotype- 

determining polymorphisms are performed using the same probe 
array. Such methods entail hybridizing a sample comprising a 
target nucleic acid comprising one or more alleles of the gene 
to an array of oligonucleotide probes immobilized on a solid 

30 support. Such an array comprises a first probe set comprising 
a plurality of probes, each probe comprising a segment of at 
least six nucleotides exactly complementary to a reference 
form of the gene, the segment including at least one 
interrogation position complementary to a corresponding 

35 nucleotide in the reference form of the gene, the reference 
form of the gene having a silent polymorphic site and a site 
of potential mutation associated with a phenotypic change. 
Such an array also contains a second, and often, third and 
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fourth probe sets. The second, third and fourth probe sets, 
each comprise a corresponding probe for each probe in the 
first probe set, the probes in the second, third and fourth 
probe sets being identical to a sequence comprising the 
5 corresponding probe from the first probe set or a subsequence 
of at least six nucleotides thereof that includes the at least 
one interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 

10 sets. The method entails determining which probes, relative 
to one another, bind to the target nucleic acid, whereby the 
relative binding of probes having an interrogation position 
aligned with the silent polymorphism indicates the number of 
different alleles of the gene in the sample and the relative 

15 binding of probes having an interrogation position aligned 

with the mutation indicates whether the mutation is present in 
at least one of the alleles. 

The invention further provides arrays of probes 
immobilized on a solid support for analyzing biotransformation 

20 genes. In a first embodiment, the invention provides a tiling 
strategy employing an array of immobilized oligonucleotide 
probes comprising at least two sets of probes. A first probe 
set comprises a plurality of probes, each probe comprising a 
segment of at least three nucleotides exactly complementary to 

25 a subsequence of a reference sequence from a biotransformation 
gene, the segment including at least one interrogation 
position complementary to a corresponding nucleotide in the 
reference sequence. A second probe set comprises a 
corresponding probe for each probe in the first probe set, the 

30 corresponding probe in the second probe set being identical to 
a sequence comprising the corresponding probe from the first 
probe set or a subsequence of at least three nucleotides 
thereof that includes the at least one interrogation position, 
except that the at least one interrogation position is 

35 occupied by a different nucleotide in each of the two 

corresponding probes from the first and second probe sets. 
The probes in the first probe set have at least two 
interrogation positions corresponding to two contiguous 
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nucleotides in the reference sequence. One interrogation 
position corresponds to one of the contiguous nucleotides, and 
the other interrogation position to the other. In this, and 
other forms of array, biotransformation genes of particular 
5 interest for analysis include cytochromes P450, particularly 
2D6 and '2 CI 9, N-acetyl transferase II, glucose 6-phosphate 
dehydrogenase, pseudocholinesterase, catechol -O-methyl 
transferase, and dihydropyridine dehydrogenase. 

In a second embodiment, the invention provides a tiling 

10 strategy employing an array comprising four probe sets. A 
first probe set comprises a plurality of probes, each probe 
comprising a segment of at least three nucleotides exactly 
complementary to a subsequence of a reference sequence from a 
biotransformation gene, the segment including at least one 

15 interrogation position complementary to a corresponding 
nucleotide in the reference sequence. Second, third and 
fourth probe sets each comprise a corresponding probe for each 
probe in the first probe set. The probes in the second, third 
and fourth probe sets are identical to a sequence comprising 

20 the corresponding probe from the first probe set or a 
subsequence of at least three nucleotides thereof that 
includes the at least one interrogation position, except that 
the at least one interrogation position is occupied by a 
different nucleotide in each of the four corresponding probes 

25 from the four probe sets. 

In a third embodiment, the invention provides arrays 
comprising first and second groups of probe sets, each group 
comprising first, second and optionally, third and fourth 
probe sets as defined above. The first probe sets in the 

30 first and second groups are designed to be exactly 

complementary to first and second reference sequences. For 
example, the first reference can include a site of mutation 
rendering the gene nonfunctional, and the second reference 
sequence can include a site of a silent polymorphism. 

35 In a fourth embodiment, the invention provides a block of 

oligonucleotides probes (sometimes referred to as an 
optiblock) immobilized on a support. The array comprises a 
perfectly matched probe comprising a segment of at least three 
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nucleotides exactly complementary to a subsequence of a 
reference sequence from a biotransformation gene, the segment 
having a plurality of interrogation positions respectively 
corresponding to a plurality of nucleotides in the reference 
sequence. For each interrogation position, the array further 
comprises three mismatched probes, each identical to a 
sequence comprising the perfectly matched probe or a 
subsequence of at least three nucleotides thereof including 
the plurality of interrogation positions, except in the 
interrogation position, which is occupied by a different 
nucleotide in each of the three mismatched probes and the 
perfectly matched probe. 

In a fifth embodiment (sometimes referred to as 
deletion tiling) , the invention provides an array comprising 
at least four probes. A first probe comprises first and 
second segments, each of at least three nucleotides and 
exactly complementary to first and second subsequences of a 
reference sequence from a biotransformation gene, the segments 
including at least one interrogation position corresponding to 
a nucleotide in the reference sequence, wherein either (1) the 
first and second subsequences are noncontiguous, or (2) the 
first and second subsequences are contiguous and the first and 
second segments are inverted relative to the complement of the 
first and second subsequences in the reference sequence. The 
array further comprises second, third and fourth probes, 
identical to a sequence comprising the first probe or a 
subsequence thereof comprising at least three nucleotides from 
each of the first and second segments, except in the at least 
one interrogation position, which differs in each of the 
probes . 

In a sixth embodiment, the invention provides a method 
of comparing a target nucleic acid with a reference sequence 
from a biotransformation gene. The method comprises 
hybridizing a sample comprising the target nucleic acid to one 
of the arrays of oligonucleotide probes described above. The 
method then determines which probes, relative to one another, 
specifically bind to the target nucleic acid, the relative 
specific binding of corresponding probes indicating whether a 
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nucleotide in the target sequence is the same or different 
from the corresponding nucleotide in the reference sequence. 

For example, for the array of the second embodiment which 
has four probe sets, the array can be analyzed by comparing 
5 the relative specific binding of four corresponding probes 

from the first, second, third and fourth probe sets, assigning 
a nucleotide in the target sequence as the complement of the 
interrogation position of the probe having the greatest 
specific binding, and repeating these steps until each 
10 nucleotide of interest in the target sequence has been 
assigned. 

In some methods, the reference sequence includes a site 
of a mutation in the biotransformation gene and a silent 
polymorphism in or flanking the biotransformation gene, and 

15 the target nucleic acid comprises one or more different 
alleles of the biotransformation gene. In this situation, 
the relative specific binding of probes having an 
interrogation position aligned with the silent polymorphism 
indicates the number of different alleles and the relative 

20 specific binding of probes having an interrogation position 
aligned with the mutation indicates whether the mutation is 
present in at least one of the alleles. 



BRIEF DESCRIPTION OF THE FIGURES 
Fig. 1: Basic tiling strategy. The figure illustrates 
25 the relationship between an interrogation position (I) and a 
corresponding nucleotide (n) in the reference sequence, and 
between a probe from the first probe set and corresponding 
probes from second, third and fourth probe sets. 

Fig. 2: Segment of complementarity in a probe from the 
30 first probe set. 

Fig. 3A: Incremental succession of probes in a basic 
tiling strategy. The figure shows four probe sets, each 
having three probes. Note that each probe dif-fers from its 
predecessor in the same set by the acquisition of a 5 1 
35 nucleotide and the loss of a 3 1 nucleotide, as well as in the 
nucleotide occupying the interrogation position. 
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Fig. 3B: Arrangement of probe sets in tiling arrays 
lacking a perfectly matched probe set. 

Fig. 4A: Exemplary arrangement of lanes on a chip. The 
chip shows four probe sets, each having five probes and each 
5 having a total of five interrogation positions (11-15) , one 
per probe. 

Fig. 4B: A tiling strategy for analyzing closing spaced 
mutations . 

Fig. 4C: A tiling strategy for avoiding loss of signal 
10 due to probe self -annealing. 

Fig. 5: Hybridization pattern of chip having probes laid 
down in lanes. Dark patches indicate hybridization. The 
probes in the lower part of the figure occur at the column of 
the array indicated by the arrow when the probes length is 15 
15 and the interrogation position 7. 

Fig. 6: Strategies for detecting deletion and insertion 
mutations. Bases in brackets may or may not be present. 

Fig, 7: Block tiling strategy. The perfectly matched 
probe has three interrogation positions. The probes from the 
20 other probe sets have only one of these interrogation 
positions. 

Fig. 8: Multiplex tiling strategy. Each probe has two 
interrogation positions. 

Fig. 9: Helper mutation strategy. The segment of 
25 complementarity differs from the complement of the reference 
sequence at a helper mutation as well as the interrogation 
position. 

Fig. 10: Layout of probes on chip for analysis of 
cytochrome P450 2D6 and cytochrome P450 2C19. 
30 Fig. 11: Alternative tiling for analysis of 

CYP2D6/CYP2D7 polymorphism. 

Fig. 12: Optiblock for analysis of CYP2D6 P34S 
polymorphism. 

Fig. 13: The chip shown in Fig. 10 hybridized to a 
35 CYP2D6-B target. 

Fig. 14: Magnification of the hybridization patterns of 
the cytochrome P450 2D6 L421P and S486 polymorphism opti- 
tiling blocks. 
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Fig. 15: Hybridization of the chip shown in Fig. 10 to 
cytochrome P4 50 2C19. 



DETAILED DESCRIPTION OF THE INVENTION 

5 The invention provides a number of strategies for 

comparing a polynucleotide of known sequence (a reference 
sequence) with variants of that sequence (target sequences) . 
The comparison can be performed at the level of entire 
genomes, chromosomes, genes, exons or introns, or can focus on 

10 individual mutant sites and immediately adjacent bases. The 
strategies allow detection of variations, such as mutations or 
polymorphisms, in the target sequence irrespective whether a 
particular variant has previously been characterized. The 
strategies both define the nature of a variant and identify 

15 its location in a target sequence. 

The strategies employ arrays of oligonucleotide probes 
immobilized to a solid support. Target sequences are analyzed 
by determining the extent of hybridization at particular 
probes in the array. The strategy in selection of probes 

20 facilitates distinction between perfectly matched probes and 
probes showing single-base or other degrees of mismatches. 
The strategy usually entails sampling each nucleotide of 
interest in a target sequence several times, thereby achieving 
a high degree of confidence in its identity. This level of 

25 confidence is further increased by sampling of adjacent 

nucleotides in the target sequence to nucleotides of interest. 
The present tiling strategies result in sequencing and 
comparison methods suitable for routine large-scale practice 
with a high degree of confidence in the sequence output. 

3 0 I. GENERAL TILING STRATEGIES 

A. Selection of Reference Sequence 

The chips are designed to contain probes exhibiting 
complementarity to one or more selected reference sequence 
whose sequence is known. The chips are used to read a target 
35 sequence comprising either the reference sequence itself or 
variants of that sequence. Target sequences may differ from 
the reference sequence at one or more positions but show a 



WO 98/30883 PCT7US98/06414 

12 

high overall degree of sequence identity with the reference 
sequence (e.g., at least 75, 90, 95, 99, 99.9 or 99.99%). Any 
polynucleotide of known sequence can be selected as a 
reference sequence. Reference sequences of interest include 
5 sequences known to include mutations or polymorphisms 
associated with phenotypic changes having clinical 
significance in human patients. For example, the CFTR gene 
and P53 gene in humans have been identified as the location of 
several mutations resulting in cystic fibrosis or cancer 

10 respectively. Other reference sequences of interest include 
those that serve to identify pathogenic microorganisms and/or 
are the site of mutations by which such microorganisms acquire 
drug resistance (e.g., the HIV reverse transcriptase gene). 
Other reference sequences of interest include regions where 

15 polymorphic variations are known to occur (e.g., the D-loop 
region of mitochondrial DNA) . These reference sequences have 
utility for, e.g., forensic or epidemiological studies. Other 
reference sequences of interest include p34 (related to p53) , 
p65 (implicated in breast, prostate and liver cancer) , and DNA 

20 segments encoding cytochromes P450 and other biotransformation 
genes (see Meyer et al. f Pharmac. Ther. 46, 349-355 (1990)). 
Other reference sequences of interest include HLA classes I 
and II. Other reference sequences of interest include those 
from the genome of pathogenic viruses (e.g., hepatitis (A, B, 

25 or C) , herpes virus (e.g., VZV, HSV-1, HAV-6, HSV-II, and CMV, 
Epstein Barr virus), adenovirus, influenza virus, 
f laviviruses, echovirus, rhinovirus, coxsackie virus, 
cornovirus, respiratory syncytial virus, mumps virus, 
rotavirus, measles virus, rubella virus, parvovirus, vaccinia 

30 virus, HTLV virus, dengue virus, papillomavirus, molluscum 
virus, poliovirus, rabies virus, JC virus and arboviral 
encephalitis virus. Other reference sequences of interest are 
from genomes or episomes of pathogenic bacteria, particularly 
regions that confer drug resistance or allow phylogenic 

35 characterization of the host (e.g., 16S rRNA or corresponding 
DNA) . For example, such bacteria include chlamydia, 
rickettsial bacteria, mycobacteria, staphylococci, treptocci, 
pneumonococci, meningococci and conococci, klebsiella, 
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proteus, serratia, pseudomonas, legionella, diphtheria, 
salmonella, bacilli, cholera, tetanus, botulism, anthrax, 
plague, leptospirosis, and Lymes disease bacteria. Other 
reference sequences of interest include those in which 
5 mutations result in the following autosomal recessive 
disorders: sickle cell anemia, ^-thalassemia, 
phenylketonuria, galactosemia, Wilson's disease, 
hemochromatosis, severe combined immunodeficiency, alpha- 1- 
antitrypsin deficiency, albinism, alkaptonuria, lysosomal 

10 storage diseases and Ehlers-Danlos syndrome. Other reference 
sequences of interest include those in which mutations result 
in X-linked recessive disorders: hemophilia, glucose-6- 
phosphate dehydrogenase, agammaglobulimenia, diabetes 
insipidus, Lesch-Nyhan syndrome, muscular dystrophy, Wiskott- 

15 Aldrich syndrome, Fabry f s disease and fragile X-syndrome. 

Other reference sequences of interest includes those in which 
mutations result in the following autosomal dominant 
disorders: familial hypercholesterolemia, polycystic kidney 
disease, Huntingdon's disease, hereditary spherocytosis, 

20 Marfan' s syndrome, von Willebrand's disease, 

neurofibromatosis, tuberous sclerosis, hereditary hemorrhagic 
telangiectasia, familial colonic polyposis, Ehlers-Danlos 
syndrome, myotonic dystrophy, muscular dystrophy, osteogenesis 
imperfecta, acute intermittent porphyria, and von Hippel- 

25 Lindau disease. 

The length of a reference sequence can vary widely from a 
full-length genome, to an individual chromosome, episome, 
gene, component of a gene, such as an exon, intron or 
regulatory sequences, to a few nucleotides. A reference 

30 sequence of between about 2, 5, 10, 20, 50, 100, 5000, 1000, 
5,000 or 10,000, 20,000 or 100,000 nucleotides is common. 
Sometimes only particular regions of a sequence (e.g., exons 
of a gene) are of interest. In such situations, the 
particular regions can be considered as separate reference 

35 sequences or can be considered as components of a single 
reference sequence, as matter of arbitrary choice. 

A reference sequence can be any naturally occurring, 
mutant, consensus or purely hypothetical sequence of 
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nucleotides, RNA or DNA. For example, sequences can be 
obtained from computer data bases, publications or can be 
determined or conceived de novo. Usually, a reference 
sequence is selected to show a high degree of sequence 
5 identity to envisaged target sequences. Often, particularly, 
where a significant degree of divergence is anticipated 
between target sequences, more than one reference sequence is 
selected. Combinations of wildtype and mutant reference 
sequences are employed in several applications of the tiling 
10 strategy. 

B. Chip Design 

1. Basic Tiling Strategy 

The basic tiling strategy provides an array of 
immobilized probes for analysis of target sequences showing a 

15 high degree of sequence identity to one or more selected 

reference sequences. The strategy is first illustrated for an 
array that is subdivided into four probe sets, although it 
will be apparent that in some situations, satisfactory results 
are obtained from only two probe sets. A first probe set 

20 comprises a plurality of probes exhibiting perfect 

complementarity with a selected reference sequence . The 
perfect complementarity usually exists throughout the length 
of the probe. However, probes having a segment or segments of 
perfect complementarity that is/are flanked by leading or 

25 trailing sequences lacking complementarity to the reference 
sequence can also be used. Within a segment of 
complementarity, each probe in the first probe set has at 
least one interrogation position that corresponds to a 
nucleotide in the reference sequence. That is, the 

30 interrogation position is aligned with the corresponding 
nucleotide in the reference sequence, when the probe and 
reference sequence are aligned to maximize complementarity 
between the two. If a probe has more than one interrogation 
position, each corresponds with a respective nucleotide in the 

35 reference sequence. The identity of an interrogation position 
and corresponding nucleotide in a particular probe in the 
first probe set cannot be determined simply by inspection of 
the probe in the first set. As will become apparent, an 
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interrogation position and corresponding nucleotide is defined 
by the comparative structures of probes in the first probe set 
and corresponding probes from additional probe sets. 

In principle, a probe could have an interrogation 
5 position at each position in the segment complementary to the 
reference sequence. Sometimes, interrogation positions 
provide more accurate data when located away from the ends of 
a segment of complementarity. Thus, typically a probe having 
a segment of complementarity of length x does not contain more 
10 than x-2 interrogation positions. Since probes are typically 
9-21 nucleotides, and usually all of a probe is complementary, 
a probe typically has 1-19 interrogation positions. Often the 
probes contain a single interrogation position, at or near the 
center of probe. 
15 For each probe in the first set, there are, for 

purposes of the present illustration, up to three 
corresponding probes from three additional probe sets. See 
Fig. 1. Thus, there are four probes corresponding to each 
nucleotide of interest in the reference sequence. Each of the 
20 four corresponding probes has an interrogation position 

aligned with that nucleotide of interest. Usually, the probes 
from the three additional probe sets are identical to the 
corresponding probe from the first probe set with one 
exception. The exception is that at least one (and often only 
25 one) interrogation position, which occurs in the same position 
in each of the four corresponding probes from the four probe 
sets, is occupied by a different nucleotide in the four probe 
sets. For example, for an A nucleotide in the reference 
sequence, the corresponding probe from the first probe set has 
3 0 its interrogation position occupied by a T, and the 

corresponding probes from the additional three probe sets have 
their respective interrogation positions occupied by A, C, or 
G, a different nucleotide in each probe. Of course, if a 
probe from the first probe set comprises trail-ing or flanking 
35 sequences lacking complementarity to the reference sequences 
(see Fig. 2), these sequences need not be present in 
corresponding probes from the three additional sets. Likewise 
corresponding probes from the three additional sets can 
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contain leading or trailing sequences outside the segment of 
complementarity that are not present in the corresponding 
probe from the first probe set. Occasionally, the probes from 
the additional three probe set are identical (with the 
5 exception of interrogation position (s) ) to a contiguous 
subsequence of the full complementary segment of the 
corresponding probe from the first probe set. In this case, 
the subsequence includes the interrogation position and 
usually differs from the full-length probe only in the 
10 omission of one or both terminal nucleotides from the termini 
of a segment of complementarity. That is, if a probe from the 
first probe set has a segment of complementarity of length n, 
corresponding probes from the other sets will usually include 
a subsequence of the segment of at least length n-2. Thus, 
15 the subsequence is usually at least 3, 4, 7, 9, 15, 21, or 25 
nucleotides long, most typically, in the range of 9-21 
nucleotides. The subsequence should be sufficiently long to 
allow a probe to hybridize detectably more strongly to a 
variant of the reference sequence mutated at the interrogation 
20 position than to the reference sequence. 

The probes can be oligodeoxyribonucleotides or 
oligoribonucleotides, or any modified forms of these polymers 
that are capable of hybridizing with a target nucleic sequence 
by complementary base-pairing. Complementary base pairing 
25 means sequence-specific base pairing which includes e.g., 
Watson-Crick base pairing as well as other forms of base 
pairing such as Hoogsteen base pairing. Modified forms 
include 2 ' -O-methyl oligoribonucleotides and so-called PNAs, 
in which oligodeoxyribonucleotides are linked via peptide 
3 0 bonds rather than phophodiester bonds. The probes can be 

attached by any linkage to a support (e.g., 3',- 5' or via the 
base). 3' attachment is more usual as this orientation is 
compatible with the preferred chemistry for solid phase 
synthesis of oligonucleotides. 
35 The number of probes in the first probe set (and as 

a consequence the number of probes in additional probe sets) 
depends on the length of the reference sequence, the number of 
nucleotides of interest in the reference sequence and the 
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number of interrogation positions per probe. In general, each 
nucleotide of interest in the reference sequence requires the 
same interrogation position in the four sets of probes. 
Consider, as an example, a reference sequence of 100 
5 nucleotides, 50 of which are of interest, and probes each 
having a single interrogation position. In this situation, 
the first probe set requires fifty probes, each having one 
interrogation position corresponding to a nucleotide of 
interest in the reference sequence. The second, third and 

10 fourth probe sets each have a corresponding probe for each 
probe in the first probe set, and so each also contains a 
total of fifty probes. The identity of each nucleotide of 
interest in the reference sequence is determined by comparing 
the relative hybridization signals at four probes having 

15 interrogation positions corresponding to that nucleotide from 
the four probe sets. 

In some reference sequences, every nucleotide is of 
interest. In other reference sequences, only certain portions 
in which variants (e.g., mutations or polymorphisms) are 

20 concentrated are of interest. In other reference sequences, 
only particular mutations or polymorphisms and immediately 
adjacent nucleotides are of interest. Usually, the first 
probe set has interrogation positions selected to correspond 
to at least a nucleotide (e.g., representing a point mutation) 

25 and one immediately adjacent nucleotide. Usually, the probes 
in the first set have interrogation positions corresponding to 
at least 3, 10, 50, 100, 1000, or 20,000 contiguous 
nucleotides. The probes usually have interrogation positions 
corresponding to at least 5, 10, 30, 50, 75, 90, 99 or 

30 sometimes 100% of the nucleotides in a reference sequence. 

Frequently, the probes in the first probe set completely span 
the reference sequence and overlap with one another relative 
to the reference sequence. For example, in one common 
arrangement each probe in the first probe set "differs from 

35 another probe in that set by the omission of a 3 1 base 

complementary to the reference sequence and the acquisition of 
a 5' base complementary to the reference sequence. See 
Figure 3A. 
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The number of probes on the chip can be quite large 
(e.g., 10 5 -10 6 ) . However, often only a relatively small 
proportion (i.e., less than about 50%, 25%, 10%, 5% or 1%) of 
the total number of probes of a given length are selected to 
5 pursue a particular tiling strategy. For example, a complete 
set of octomer probes comprises 65,536 probes; thus, an array 
of the invention typically has fewer than 32,768 octomer 
probes. A complete array of decamer probes comprises 
1,048,576 probes; thus, an array of the invention typically 

10 has fewer than about 500,000 decamer probes. Often arrays 

have a lower limit of 25, 50 or 100 probes and an upper limit 
of 1,000,000, 100,000, 10,000 or 1000 probes. The arrays can 
have other components besides the probes such as linkers 
attaching the probes to a support. 

15 Some advantages of the use of only a proportion of 

all possible probes of a given length include: (i) each 
position in the array is highly informative, whether or not 
hybridization occurs; (ii) nonspecific hybridization is 
minimized; (iii) it is straightforward to correlate 

20 hybridization differences with sequence differences, 

particularly with reference to the hybridization pattern of a 
known standard; and (iv) the ability to address each probe 
independently during synthesis, using high resolution 
photolithography, allows the array to be designed and 

25 optimized for any sequence. For example the length of any 
probe can be varied independently of the others. 

For conceptual simplicity, the probes in a set are 
usually arranged in order of the sequence in a lane across the 
chip. A lane contains a series of overlapping probes, which 

30 represent or tile across, the selected reference sequence (see 
Figure 3A) . The components of the four sets of probes are 
usually laid down in four parallel lanes, collectively 
constituting a row in the horizontal direction and a series of 
4 -member columns in the vertical direction. Corresponding 

35 probes from the four probe sets (i.e., complementary to the 
same subsequence of the reference sequence) occupy a column. 
Each probe in a lane usually differs from its predecessor in 
the lane by the omission of a base at one end and the 
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inclusion of additional base at the other end as shown in 
Figure 3A. However, this orderly progression of probes can be 
interrupted by the inclusion of control probes or omission of 
probes in certain columns of the array. Such columns serve as 
5 controls to orient the chip, or gauge the background, which 
can include target sequence nonspecif ically bound to the chip. 

The probes sets are usually laid down in lanes such 
that all probes having an interrogation position occupied by 
an A form an A- lane, all probes having an interrogation 

10 position occupied by a C form a C-lane, all probes having an 
interrogation position occupied by a G form a G-lane, and all 
probes having an interrogation position occupied by a T (or U) 
form a T lane (or a U lane) . Note that in this arrangement 
there is not a unique correspondence between probe sets and 

15 lanes. Thus, the probe from the first probe set is laid down 
in the A- lane, C-lane, A- lane, A- lane and T-lane for the five 
columns in Figure 4A. The interrogation position on a column 
of probes corresponds to the position in the target sequence 
whose identity is determined from analysis of hybridization to 

20 the probes in that column. Thus, I^Is respectively 

correspond to N x -N 5 in Figure 4A. The interrogation position 
can be anywhere in a probe but is usually at or near the 
central position of the probe to maximize differential 
hybridization signals between a perfect match and a single- 

25 base mismatch. For example, for an 11 mer probe, the central 
position is the sixth nucleotide. 

Although the array of probes is usually laid down in 
rows and columns as described above, such a physical 
arrangement of probes on the chip is not essential. Provided 

30 that the spatial location of each probe in an array is known, 
the data from the probes can be collected and processed to 
yield the sequence of a target irrespective of the physical 
arrangement of the probes on a chip. In processing the data, 
the hybridization signals from the respective probes can be 

35 reassorted into any conceptual array desired for subsequent 
data reduction whatever the physical arrangement of probes on 
the chip. 
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A range of lengths of probes can be employed in the 
chips. As noted above, a probe may consist exclusively of a 
complementary segments, or may have one or more complementary 
segments juxtaposed by flanking, trailing and/or intervening 
5 segments. In the latter situation, the total length of 

complementary segment (s) is more important that the length of 
the probe. In functional terms, the complementary segment (s) 
of the first probe sets should be sufficiently long to allow 
the probe to hybridize detectably more strongly to a reference 

10 sequence compared with a variant of the reference including a 
single base mutation at the nucleotide corresponding to the 
interrogation position of the probe. Similarly, the 
complementary segment (s) in corresponding probes from 
additional probe sets should be sufficiently long to allow a 

15 probe to hybridize detectably more strongly to a variant of 
the reference sequence having a single nucleotide 
substitution at the interrogation position relative to the 
reference sequence. A probe usually has a single 
complementary segment having a length of at least 

20 3 nucleotides, and more usually at least 5, 6, 7, 8, 9, 10, 

11, 12, 13, 14, 15, i6, 17, 18, 19, 20, 21, 22, 23, 24, 25 or 
30 bases exhibiting perfect complementarity (other than 
possibly at the interrogation position (s) depending on the 
probe set) to the reference sequence. In bridging strategies, 

25 where more than one segment of complementarity is present, 

each segment provides at least three complementary nucleotides 
to the reference sequence and the combined segments provide at 
least two segments of three or a total of six complementary 
nucleotides. As in the other strategies, the combined length 

30 of complementary segments is typically from 6-30 nucleotides, 
and preferably from about 9-21 nucleotides. The two segments 
are often approximately the same length. Often, the probes 
(or segment of complementarity within probes) have an odd 
number of bases, so that an interrogation posrtion can occur 

35 in the exact center of the probe. 

In some chips, all probes are the same length. 
Other chips employ different groups of probe sets, in which 
case the probes are of the same size within a group, but 
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differ between different groups. For example, some chips have 
one group comprising four sets of probes as described above in 
which all the probes are 11 mers, together with a second group 
comprising four sets of probes in which all of the probes are 
5 13 mers. Of course, additional groups of probes can be added. 
Thus, some chips contain, e.g., four groups of probes having 
sizes of 11 mers, 13 mers, 15 mers and 17 mers. Other chips 
have different size probes within the same group of four probe 
sets. In these chips, the probes in the first set can vary in 

10 length independently of each other. Probes in the other sets 
are usually the same length as the probe occupying the same 
column from the first set. However, occasionally different 
lengths of probes can be included at the same column position 
in the four lanes. The different length probes are included 

15 to equalize hybridization signals from probes irrespective of 
whether A-T or C-G bonds are formed at the interrogation 
position. 

The length of probe can be important in 
distinguishing between a perfectly matched probe and probes 

20 showing a single-base mismatch with the target sequence. The 
discrimination is usually greater for short probes. Shorter 
probes are usually also less susceptible to formation of 
secondary structures. However, the absolute amount of target 
sequence bound, and hence the signal, is greater for larger 

25 probes. The probe length representing the optimum compromise 
between these competing considerations may vary depending on 
inter alia the GC content of a particular region of the target 
DNA sequence, secondary structure, synthesis efficiency and 
cross-hybridization. In some regions of the target, depending 

30 on hybridization conditions, short probes (e.g., 11 mers) may 
provide information that is inaccessible from longer probes 
(e.g., 19 mers) and vice versa. Maximum sequence information 
can be read by including several groups of different sized 
probes on the chip as noted above. However, for many regions 

35 of the target sequence, such a strategy provides redundant 
information in that the same sequence is read multiple times 
from the different groups of probes. Equivalent information 
can be obtained from a single group of different sized probes 



10 



WO 98/30883 PCT/US98/06414 

22 

in which the sizes are selected to maximize readable sequence 
at particular regions of the target sequence. The strategy of 
customizing probe length within a single group of probe sets 
minimizes the total number of probes required to read a 
particular target sequence. This leaves ample capacity for 
the chip to include probes to other reference sequences. 

The invention provides an optimization block which 
allows systematic variation of probe length and interrogation 
position to optimize the selection of probes for analyzing a 
particular nucleotide in a reference sequence. The block 
comprises alternating columns of probes complementary to the 
wildtype target and probes complementary to a specific 
mutation. The interrogation position is varied between 
columns and probe length is varied down a column. 
15 Hybridization of the chip to the reference sequence or the 
mutant form of the reference sequence identifies the probe 
length and interrogation position providing the greatest 
differential hybridization signal. 

Variation of interrogation position in probes for 
analyzing different regions of a target sequence offers a 
number of advantages. If a segment of a target sequence 
contains two closely spaced mutations, ml, and m2, and probes 
for analyzing that segment have an interrogation position at 
or near the middle, then no probe has an interrogation 
25 position aligned with one of the mutations without overlapping 
the other mutation (see first probe in Figure 4B) . Thus, the 
presence of a mutation would have to be detected by comparing 
the hybridization signal of a single -mismatched probe with a 
double-mismatched probe. By contrast, if the interrogation 
position is near the 3» end of the probes, probes can have 
their interrogation position aligned with ml without 
overlapping m2 (second probe in Figure 4B) . Thus, the 
mutation can be detected by a comparison of a perfectly 
matched probe with single based mismatched probes. Similarly, 
35 if the interrogation position is near the 5» end of the 

probes, probes can have their interrogation position aligned 
with m2 without overlapping ml (third probe in Figure 4B) . 
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Variation of the interrogation position also offers 
the advantage of reducing loss of signal due to self -annealing 
of certain probes. Figure 4C shows a target sequence having a 
nucleotide X, which can be read either from the relative 
signals of the four probes having a central interrogation 
position (shown at the left of the figure) or from the four 
probes having the interrogation position near the three prime 
end (shown at the right of the figure) . Only the probes 
having the central interrogation position are capable of self- 
annealing. Thus, a higher signal is obtained from the probes 
having the interrogation position near the terminus. 

The probes are designed to be complementary to 
either strand of the reference sequence (e.g., coding or non- 
coding) . Some chips contain separate groups of probes, one 
15 complementary to the coding strand, the other complementary to 
the noncoding strand. Independent analysis of coding and 
noncoding strands provides largely redundant information. 
However, the regions of ambiguity in reading the coding strand 
are not always the same as those in reading the noncoding 
strand. Thus, combination of the information from coding and 
noncoding strands increases the overall accuracy of 
sequencing. 

Some chips contain additional probes or groups of 
probes designed to be complementary to a second reference 

25 sequence. The second reference sequence is often a 

subsequence of the first reference sequence bearing one or 
more commonly occurring mutations or interstrain variations. 
The second group of probes is designed by the same principles 
as described above except that the probes exhibit 

•30 complementarity to the second reference sequence. The 

inclusion of a second group is particular useful for analyzing 
short subsequences of the primary reference sequence in which 
multiple mutations are expected to occur within a short 
distance commensurate with the length of the probes (i.e., two 

35 or more mutations within 9 to 21 bases) . Of course, the same 
principle can be extended to provide chips containing groups 
of probes for any number of reference sequences. 
Alternatively, the chips may contain additional probe (s) that 
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do not form part of a tiled array as noted above, but rather 
serves as probe (s) for a conventional reverse dot blot. For 
example, the presence of mutation can be detected from binding 
of a target sequence to a single oligomeric probe harboring 
the mutation. Preferably, an additional probe containing the 
equivalent region of the wildtype sequence is included as a 
control . 

Although only a subset of probes is required to 
analyze a particular target sequence, it is quite possible 
that other probes superfluous to the contemplated analysis are 
also included on the chip. In the extreme case, the chip 
could can a complete set of all probes of a given length 
notwithstanding that only a small subset is required to 
analyze the particular reference sequence of interest. 
15 Although such a situation might appear wasteful of resources, 
a chip including a complete set of probes offers the advantage 
of including the appropriate subset of probes for analyzing 
any reference sequence. Such a chip also allows simultaneous 
analysis of a reference sequence from different subsets of 
20 probes (e.g., subsets having the interrogation site at 
different positions in the probe) . 

In its simplest terms, the analysis of a chip 
reveals whether the target sequence is the same or different 
from the reference sequence. If the two are the same, all 
25 probes in the first probe set show a stronger hybridization 
signal than corresponding probes from other probe sets. If 
the two are different, most probes from the first probe set 
still show a stronger hybridization signal than corresponding 
probes from the other probe sets, but some probes from the 
30 first probe set do not. Thus, when a probe from another probe 
sets light up more strongly than the corresponding probe from 
the first probe set, this provides a simple visual indication 
that the target sequence and reference sequence differ. 

The chips also reveal the nature and-position of 
35 differences between the target and reference sequence. The 

chips are read by comparing the intensities of labelled target 
bound to the probes in an array. Specifically, for each 
nucleotide of interest in the target sequence, a comparison is 
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performed between probes having an interrogation position 
aligned with that position. These probes form a column 
(actual or conceptual) on the chip. For example, a column 
often contains one probe from each of A, C, G and T lanes. 
The nucleotide in the target sequence is identified as the 
complement of the nucleotide occupying the interrogation 
position in the probe showing the highest hybridization signal 
from a column. Fig. 6 shows the hybridization pattern of a 
chip hybridized to its reference sequence. The dark square in 
each column represents the probe from the column having the 
highest hybridization signal. The sequence can be read by 
following the pattern of dark squares from left to right 
across the chip. The first dark square is in the A lane 
indicating that the nucleotide occupying the interrogation 
position of the probe represented by this square is an A. The 
first nucleotide in the reference sequence is the complement 
of nucleotide occupying the interrogation position of this 
probe (i.e., a T) . Similarly, the second dark square is in 
the T-lane, from which it can be deduced that the second 
nucleotide in the reference sequence is an A. Likewise the 
third dark square is in the T-lane, from which it can be 
deduced that the third nucleotide in the reference sequence is 
also an A, and so forth. By including probes in the first 
probe set (and by implication in the other probe sets) with 
interrogation positions corresponding to every nucleotide in a 
reference sequence, it is possible to read substantially every 
nucleotide in a target sequence, thereby revealing the 
complete or nearly complete sequence of the target. 

Of the four probes in a column, only one can 
exhibit a perfect match to the target sequence whereas the 
others usually exhibit at least a one base pair mismatch. The 
probe exhibiting a perfect match usually produces a 
substantially greater hybridization signal than the other 
three probes in the column and is thereby easily identified. 
35 However, in some regions of the target sequence, the 

distinction between a perfect match and a one-base mismatch is 
less clear. Thus, a call ratio is established to define the 
ratio of signal from the best hybridizing probes to the second 
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best hybridizing probe that must be exceeded for a particular 
target position to be read from the probes. A high call ratio 
ensures that few if any errors are made in calling target 
nucleotides, but can result in some nucleotides being scored 
5 as ambiguous, which could in fact be accurately read. A lower 
call ratio results in fewer ambiguous calls, but can result in 
more erroneous calls. It has been found that at a call ratio 
of 1.2 virtually all calls are accurate. However, a small but 
significant number of bases (e.g., up to about 10%) may have 
10 to be scored as ambiguous. 

Although small regions of the target sequence can 
sometimes be ambiguous, these regions usually occur at the 
same or similar segments in different target sequences. Thus, 
for precharacterized mutations, it is known in advance whether 
15 that mutation is likely to occur within a region of 
unambiguously determinable sequence. 

An array of probes is most useful for analyzing the 
reference sequence from which the probes were designed and 
variants of that sequence exhibiting substantial sequence 
20 similarity with the reference sequence (e.g., several single- 
base mutants spaced over the reference sequence) . When an 
array is used to analyze the exact reference sequence from 
which it was designed, one probe exhibits a perfect match to 
the reference sequence, and the other three probes in the same 
25 column exhibits single-base mismatches. Thus, discrimination 
between hybridization signals is usually high and accurate 
sequence is obtained. High accuracy is also obtained when an 
array is used for analyzing a target sequence comprising a 
variant of the reference sequence that has a single mutation 
relative to the reference sequence, or several widely spaced 
mutations relative to the reference sequence. At different 
mutant loci, one probe exhibits a perfect match to the target, 
and the other three probes occupying the same column exhibit 
single-base mismatches, the difference (with respect to 
35 analysis of the reference sequence) being the lane in which 
the perfect match occurs. 

For target sequences showing a high degree of 
divergence from the reference strain or incorporating several 
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closely spaced mutations from the reference strain, a single 
group of probes (i.e., designed with respect to a single 
reference sequence) will not always provide accurate sequence 
for the highly variant region of this sequence. At some 
5 particular columnar positions, it may be that no single probe 
exhibits perfect complementarity to the target and that any 
comparison must be based on different degrees of mismatch 
between the four probes. Such a comparison does not always 
allow the target nucleotide corresponding to that columnar 

10 position to be called. Deletions in target sequences can be 
detected by loss of signal from probes having interrogation 
positions encompassed by the deletion. However, signal may 
also be lost from probes having interrogation positions 
closely proximal to the deletion resulting in some regions of 

15 the target sequence that cannot be read. Target sequence 

bearing insertions will also exhibit short regions including 
and proximal to the insertion that usually cannot be read. 

The presence of short regions of difficult-to-read 
target because of closely spaced mutations, insertions or 

20 deletions, does not prevent determination of the remaining 
sequence of the target as different regions of a target 
sequence are determined independently. Moreover, such 
ambiguities as might result from analysis of diverse variants 
with a single group of probes can be avoided by including 

25 multiple groups of probe sets on a chip. For example, one 
group of probes can be designed based on a full-length 
reference sequence, and the other groups on subsequences of 
the reference sequence incorporating frequently occurring 
mutations or strain variations. 

30 A particular advantage of the present sequencing 

strategy over conventional sequencing methods is the capacity 
simultaneously to detect and quantify proportions of multiple 
target sequences. Such capacity is valuable, e.g., for 
diagnosis of patients who are heterozygous with respect to. a 

35 gene or who are infected with a virus, such as HIV, which is 
usually present in several polymorphic forms. Such capacity 
is also useful in analyzing targets from biopsies of tumor 
cells and surrounding tissues. The presence of multiple 
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target sequences is detected from the relative signals of the 
four probes at the array columns corresponding to the target 
nucleotides at which diversity occurs. The relative signals 
of the four probes for the mixture under test are compared 
5 with the corresponding signals from a homogeneous reference 
sequence. An increase in a signal from a probe that is 
mismatched with respect to the reference sequence, and a 
corresponding decrease in the signal from the probe which is 
matched with the reference sequence, signal the presence of a 

10 mutant strain in the mixture. The extent in shift in 
hybridization signals of the probes is related to the 
proportion of a target sequence in the mixture. Shifts in 
relative hybridization signals can be quantitatively related 
to proportions of reference and mutant sequence by prior 

15 calibration of the chip with seeded mixtures of the mutant and 
reference sequences. By this means, a chip can be used to 
detect variant or mutant strains constituting as little as 1, 
5, 20, or 25 % of a mixture of stains. 

Similar principles allow the simultaneous analysis 

20 of multiple target sequences even when none is identical to 
the reference sequence. For example, with a mixture of two 
target sequences bearing first and second mutations, there 
would be a variation in the hybridization patterns of probes 
having interrogation positions corresponding to the first and 

25 second mutations relative to the hybridization pattern with 
the reference sequence. At each position, one of the probes 
having a mismatched interrogation position relative to the 
reference sequence would show an increase in hybridization 
t signal, and the probe having a matched interrogation position 

30 relative to the reference sequence would show a decrease in 
hybridization signal. Analysis of the hybridization pattern 
of the mixture of mutant target sequences, preferably in 
comparison with the hybridization pattern of the reference 
sequence, indicates the presence of two mutant" target 

35 sequences, the position and nature of the mutation in each 
strain, and the relative proportions of each strain. 

In a variation of the above method, several target 
sequences target sequences are differentially labelled before 
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being simultaneously applied to the array. For example, each 
different target sequence can be labelled with a fluorescent 
labels emitting at different wavelength. After applying a 
mixtures of target sequence to the arrays, the individual 
5 target sequences can be distinguished and independently 

analyzed by virtue of the differential labels. For example, 
the methods target sequences obtained from a patient at 
different stages of a disease can be differently labelled and 
analyzed simultaneously, facilitating identification of new 
10 mutations. 

2 . Omission of Probes 

The basic strategy outlined above employs four 
probes to read each nucleotide of interest in a target 
sequence. One probe (from the first probe set) shows a 

15 perfect match to the reference sequence and the other three 
probes (from the second, third and fourth probe sets) exhibit 
a mismatch with the reference sequence and a perfect match 
with a target sequence bearing a mutation at the nucleotide of 
interest. The provision of three probes from the second, 

20 third and fourth probe sets allows detection of each of the 
three possible nucleotide substitutions of any nucleotide of 
interest. However, in some reference sequences or regions of 
reference sequences, it is known in advance that only certain 
mutations are likely to occur. Thus, for example, at one site 

25 it might be known that an A nucleotide in the reference 

sequence may exist as a T mutant in some target sequences but 
is unlikely to exist as a C or G mutant. Accordingly, for 
analysis of this region of the reference sequence, one might 
include only the first and second probe sets, the first probe 

30 set exhibiting perfect complementarity to the reference 

sequence, and the second probe set having an interrogation 
position occupied by an invariant A residue (for detecting the 
T mutant). In other situations, one might include the first, 
second and third probes sets (but not the fourth) for 

35 detection of a wildtype nucleotide in the reference sequence 
and two mutant variants thereof in target sequences. In some 
chips, probes that would detect silent mutations (i.e., not 
affecting amino acid sequence) are omitted. 



* 



15 



WO 98/30883 PCT/US98/06414 

30 

Some chips effectively contain the second, third and 
optionally, the fourth probes sets described in the basic 
tiling strategy (i.e., the mismatched probe sets) but omit 
some or all of the probes from the first probe set (i.e., 
5 perfectly matched probes) . Therefore, such chips comprise at 
least two probe sets, which will arbitrarily be referred to as 
probe sets A and B (to avoid confusion with the nomenclature 
used to describe the four probe sets in the basic tiling 
strategy) . Probe set A has a plurality of probes. Each probe 
10 comprises a segment exactly complementary to a subsequence of 
a reference sequence except in at least one interrogation 
position. The interrogation position corresponds to a 
nucleotide in the reference sequence juxtaposed with the 
interrogation position when the reference sequence and probe 
are maximally aligned. Probe set B has a corresponding probe 
for each probe in the first probe set. The corresponding 
probe in probe set B is identical to a sequence comprising the 
corresponding probe from the first probe set or a subsequence 
thereof that includes the at least one (and usually only one) 
interrogation position except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the two corresponding probes from the probe sets A 
and B. An additional probe set C, if present, also comprises 
a corresponding probe for each probe in the probe set A except 
25 in the at least one interrogation position, which differs in 
the corresponding probes from probe sets A, B and C. The 
arrangement of probe sets A, B and C is shown in Figure 3B. 
Figure 3B is the same as Figure 3A except that the first probe 
set has been omitted and the second, third and fourth probe 
30 sets in Figure 3A have been relabelled as probe sets A, B and 
C in Figure 3B. 

Chips lacking perfectly matched probes are 
preferably analyzed by hybridization to both target and 
reference sequences. The hybridizations can be performed 
35 sequentially, or, if the target and reference are 

differentially labelled, concurrently. The hybridization data 
are then analyzed in two ways. First, considering only the 
hybridization signals of the probes to the target sequence, 
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one compares the signals of corresponding probes for each 
position of interest in the target sequence. For a position 
of mismatch with the reference sequence, one of the probes 
having an interrogation position aligned with that position in 
the target sequence shows a substantially higher signal than 
other corresponding probes. The nucleotide occupying the 
position of mismatch in the target sequence is the complement 
of the nucleotide occupying the interrogation position of the 
corresponding probe showing the highest signal. For a 
position where target and reference sequence are the same, 
none of the corresponding probes having an interrogation 
position aligned with that position in the target sequence is 
matched, and corresponding probes generally show weak signals, 
which may vary somewhat from each other. 

In a second level of analysis, the ratio of 
hybridization signals to the target and reference sequences is 
determined for each probe in the array. For most probes in 
the array the ratio of hybridization signals is about the 
same. For such a probe, it can be deduced that the 
interrogation position of the probe corresponds to a 
nucleotide that is the same in target and reference sequences. 
A few probes show a much higher ratio of target hybridization 
to reference hybridization than the majority of probes. For 
such a probe, it can be deduced that the interrogation 
position of the probe corresponds to a nucleotide that differs 
between target and reference sequences, and that in the 
target, this nucleotide is the complement of the nucleotide 
occupying the interrogation position of the probe. The second 
level of analysis serves as a control to confirm the 
identification of differences between target and reference 
sequence from the first level of analysis. 

3- Wildtvpe Probe Lane 

When the chips comprise four probe sets, as 
discussed supra, and the probe sets are laid down in four 
35 lanes, an A lane, a C-lane, a G lane and a T or U lane, the 
probe having a segment exhibiting perfect complementarity to a 
reference sequence varies between the four lanes from one 
column to another. This does not present any significant 
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difficulty in computer analysis of the data from the chip. 

However, visual inspection of the hybridization pattern of the 

chip is sometimes facilitated by provision of an extra lane of 

probes, in which each probe has a segment exhibiting perfect 

5 complementarity to the reference sequence. See Figure 4A. 

This extra lane of probes is called the wildtype lane and 

contains only probes from the first probe set. Each wildtype 

lane probe has a segment that is identical to a segment from 

one of the probes in the other four lanes (which lane 

10 depending on the column position) . The wildtype lane 

hybridizes to a target sequence at all nucleotide positions 

except those in which deviations from the reference sequence 

occurs. The hybridization pattern of the wildtype lane 

thereby provides a simple visual indication of mutations. 

15 Deletion , Insertion and Multiple-Mutation 

Probes 

Some chips provide an additional probe set 
specifically designed for analyzing deletion mutations. The 
additional probe set comprises a probe corresponding to each 

20 probe in the first probe set as described above. However, a 
probe from the additional probe set differs from the 
corresponding probe in the first probe set in that the 
nucleotide occupying the interrogation position is deleted in 
the probe from the additional probe set. See Fig. 6. 

25 Optionally, the probe from the additional probe set bears an 
additional nucleotide at one of its termini relative to the 
corresponding probe from the first probe set (shown in 
brackets in Fig. 6). The probe from the additional probe set 
will hybridize more strongly than the corresponding probe from 

30 the first probe set to a target sequence having a single base 
deletion at the nucleotide corresponding to the interrogation 
position. Additional probe sets are provided in which not 
only the interrogation position, but also an adjacent 
nucleotide is deleted. 

35 Similarly, other chips provide additional probe sets 

for analyzing insertions. For example, one additional probe 
set has a probe corresponding to each probe in the first probe 
set as described above. However, the probe in the additional 
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probe set has an extra T nucleotide inserted adjacent to the 
interrogation position. See Fig. 6 (the extra T is shown in a 
square box) . Optionally, the probe has one fewer nucleotide 
at one of its termini relative to the corresponding probe from 
the first probe set (shown in brackets) , The probe from the 
additional probe set hybridizes more strongly than the 
corresponding probe from the first probe set to a target 
sequence having an A insertion to the left of nucleotide 
»n"the reference sequence in Fig. 6. Similar additional probe 
sets can be constructed having C, G or A nucleotides inserted 
adjacent to the interrogation position. 

Usually, four such additional probe sets, one for 
each nucleotide, are used in combination. Comparison of the 
hybridization signal of the probes from the additional probe 
15 sets with the corresponding probe from the first probe set 
indicates whether the target sequence contains and insertion. 
For example, if a probe from one of the additional probe sets 
shows a higher hybridization signal than a corresponding probe 
from the first probe set, it is deduced that the target 
20 sequence contains an insertion adjacent to the corresponding 
nucleotide (n) in the target sequence. The inserted base in 
the target is the complement of the inserted base in the probe 
from the additional probe set showing the highest 
hybridization signal. If the corresponding probe from the 
25 first probe set shows a higher hybridization signal than the 
corresponding probes from the additional probe sets, then the 
target sequence does not contain an insertion to the left of 
corresponding position ((»n» in Fig. 6)) in the target 
sequence . 

30 Other chips provide additional probes (multiple- 

mutation probes) for analyzing target sequences having 
multiple closely spaced mutations. A multiple-mutation probe 
is usually identical to a corresponding probe from the first 
set as described above, except in the base occupying the 

35 interrogation position, and except at one or more additional 
positions, corresponding to nucleotides in which substitution 
may occur in the reference sequence. The one or more 
additional positions in the multiple mutation probe are 
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occupied by nucleotides complementary to the nucleotides 
occupying corresponding positions in the reference sequence 
when the possible substitutions have occurred. 

5. Block Tiling 
5 In block tiling, a perfectly matched (or wildtype) 

probe is compared with multiple sets of mismatched or mutant 
probes. The perfectly matched probe and the multiple sets of 
mismatched probes with which it is compared collectively form 
a group or block of probes on the chip. Each set comprises at 

10 least one, and usually, three mismatched probes. Fig. 7 shows 
a perfectly matched probe (CAATCGA) having three interrogation 
positions I 2 and I 3 ) . The perfectly matched probe is 

compared with three sets of probes (arbitrarily designated A, 
B and C) , each having three mismatched probes. In set A, the 

15 three mismatched probes are identical to a sequence comprising 
the perfectly matched probe or a subsequence thereof including 
the interrogation positions, except at the first interrogation 
position. That is, the mismatched probes in the set A differ 
from the perfectly matched probe set at the first 

20 interrogation position. Thus, the relative hybridization 
signals of the perfectly matched probe and the mismatched 
probes in the set A indicates the identity of the nucleotide 
in a target sequence corresponding to the first interrogation 
position. This nucleotide is the complement of the nucleotide 

25 occupying the interrogation position of the probe showing the 
highest signal. Similarly, set B comprises three mismatched 
probes, that differ from the perfectly matched probe at the 
second interrogation position. The relative hybridization 
intensities of the perfectly matched probe and the three 

30 mismatched probes of set B reveal the identity of the 

nucleotide in the target sequence corresponding to the second 
interrogation position (i.e., n2 in Fig. 7). Similarly, the 
three mismatched probes in set C in Fig. 7 differ from the 
perfectly matched probe at the third interrogation position. 

35 Comparison of the hybridization intensities of the perfectly 
matched probe and the mismatched probes in the set C reveals 
the identity of the nucleotide in the target sequence 
corresponding to the third interrogation position (n3) . 
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As noted above, a perfectly matched probe may have 
seven or more interrogation positions. If there are seven 
interrogation positions, there are seven sets of three 
mismatched probe, each set serving to identify the nucleotide 
5 corresponding to one of the seven interrogation positions. 
Similarly, if there are 20 interrogation positions in the 
perfectly matched probe, then 20 sets of three mismatched 
probes are employed. As in other tiling strategies, selected 
probes can be omitted if it is known in advance that only 

10 certain types of mutations are likely to arise. 

Each block of probes allows short regions of a 
target sequence to be read. For example, for a block of 
probes having seven interrogation positions, seven nucleotides 
in the target sequence can be read. Of course, a chip can 

15 contain any number of blocks depending on how many nucleotides 
of the target are of interest. The hybridization signals for 
each block can be analyzed independently of any other block. 
The block tiling strategy can also be combined with other 
tiling strategies, with different parts of the same reference 

20 sequence being tiled by different strategies. 

The block tiling strategy is a species of the basic 
tiling strategy discussed above, in which the probe from the 
first probe set has more than one interrogation position. The 
perfectly matched probe in the block tiling strategy is 

25 equivalent to a probe from the first probe set in the basic 
tiling strategy. The three mismatched probes in set A in 
block tiling are equivalent to probes from the second, third 
and fourth probe sets in the basic tiling strategy. The three 
mismatched probes in set B of block tiling are equivalent to 

3 0 probes from additional probe sets in basic tiling arbitrarily 
designated the fifth, sixth and seventh probe sets. The three 
mismatched probes in set C of blocking tiling are equivalent 
to probes from three further probe sets in basic tiling 
arbitrarily designated the eighth, ninth and tenth probe sets. 

35 The block tiling strategy offers two advantages over 

a basic strategy in which each probe in the first set has a 
single interrogation position. One advantage is that the same 
sequence information can be obtained from fewer probes. A 
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second advantage is that each of the probes constituting a 
block (i.e., a probe from the first probe set and a 
corresponding probe from each of the other probe sets) can 
have identical 3' and 5' sequences, with the variation 
5 confined to a central segment containing the interrogation 
positions. The identity of 3' sequence between different 
probes simplifies the strategy for solid phase synthesis of 
the probes on the chip and results in more uniform deposition 
of the different probes on the chip, thereby in turn 
10 increasing the uniformity of signal to noise ratio for 
different regions of the chip. 

6 . Multiplex Tiling 

In the block tiling strategy discussed above, the 
identity of a nucleotide in a target or reference sequence is 

15 determined by comparison of hybridization patterns of one 
probe having a segment showing a perfect match with that of 
other probes (usually three other probes) showing a single 
base mismatch. In multiplex tiling, the identity of at least 
two nucleotides in a reference or target sequence is 

20 determined by comparison of hybridization signal intensities 
of four probes, two of which have a segment showing perfect 
complementarity or a single base mismatch to the reference 
sequence, and two of which have a segment showing perfect 
complementarity or a double-base mismatch to a segment. The 

25 four probes whose hybridization patterns are to be compared 
each have a segment that is exactly complementary to a 
reference sequence except at two interrogation positions, in 
which the segment may or may not be complementary to the 
reference sequence. The interrogation positions correspond to 

30 the nucleotides in a reference or target sequence which are 
determined by the comparison of intensities. The nucleotides 
occupying the interrogation positions in the four probes are 
selected according to the following rule. The first 
interrogation position is occupied by a different nucleotide 

35 in each of the four probes. The second interrogation position 
is also occupied by a different nucleotide in each of the four 
probes. In two of the four probes, designated the first and 
second probes, the segment is exactly complementary to the 
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reference sequence except at not more than one of the two 
interrogation positions. In other words, one of the 
interrogation positions is occupied by a nucleotide that is 
complementary to the corresponding nucleotide from the 
5 reference sequence and the other interrogation position may or 
may not be so occupied. In the other two of the four probes, 
designated the third and fourth probes, the segment is exactly 
complementary to the reference sequence except that both 
interrogation positions are occupied by nucleotides which are 

10 noncomplementary to the respective corresponding nucleotides 
in the reference sequence. 

There are number of ways of satisfying these 
conditions depending on whether the two nucleotides in the 
reference sequence corresponding to the two interrogation 

15 positions are the same or different. If these two nucleotides 
are different in the reference sequence (probability 3/4), the 
conditions are satisfied by each of the two interrogation 
positions being occupied by the same nucleotide in any given 
probe. For example, in the first probe, the two interrogation 

20 positions would both be A, in the second probe, both would be 
C, in the third probe, each would be G, and in the fourth 
probe each would be T or U. If the two nucleotides in the 
reference sequence corresponding to the two interrogation 
positions are different, the conditions noted above are 

25 satisfied by each of the interrogation positions in any one of 
the four probes being occupied by complementary nucleotides. 
For example, in the first probe, the interrogation positions 
could be occupied by A and T, in the second probe by C and G, 
in the third probe by G and C, and in the four probe, by T and 

30 A. See (Fig. 8) . 

When the four probes are hybridized to a target that 
is the same as the reference sequence or differs from the 
reference sequence at one (but not both) of the interrogation 
positions, two of the four probes show a double -mismatch with 
35 the target and two probes show a single mismatch. The 

identity of probes showing these different degrees of mismatch 
can be determined from the different hybridization signals. 
From the identity of the probes showing the different degrees 
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of mismatch, the nucleotides occupying both of the 
interrogation positions in the target sequence can be deduced. 

For ease of illustration, the multiplex strategy has 
been initially described for the situation where there are two 
5 nucleotides of interest in a reference sequence and only four 
probes in an array. Of course, the strategy can be extended 
to analyze any number of nucleotides in a target sequence by 
using additional probes. In one variation, each pair of 
interrogation positions is read from a unique group of four 

10 probes. In a block variation, different groups of four probes 
exhibit the same segment of complementarity with the reference 
sequence, but the interrogation positions move within a block. 
The block and standard multiplex tiling variants can of course 
be used in combination for different regions of a reference 

15 sequence. Either or both variants can also be used in 

combination with any of the other tiling strategies described. 

7. Helper Mutations 

Occasionally, small regions of a reference sequence 
give a low hybridization signal as a result of annealing of 

20 probes. The self -annealing reduces the amount of probe 

effectively available for hybridizing to the target. Although 
such regions of the target are generally small and the 
reduction of hybridization signal is usually not so 
substantial as to obscure the sequence of this region, this 

25 concern can be avoided by the use of probes incorporating 
helper mutations. A helper mutation refers to a position of 
mismatch in a probe other than at an interrogation position. 
The helper mutation (s) serve to break-up regions of internal 
complementarity within a probe and thereby prevent annealing. 

30 Usually, one or two helper mutations are quite sufficient for 
this purpose. The inclusion of helper mutations can be 
beneficial in any of the tiling strategies noted above. In 
general each probe having a particular interrogation position 
has the same helper mutation (s) . Thus, such probes have a 

35 segment in common which shows perfect complementarity with a 
reference sequence, except that the segment contains at least 
one helper mutation (the same in each of the probes) and at 
least one interrogation position (different in all of the 
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probes) . For example, in the basic tiling strategy, a probe 
from the first probe set comprises a segment containing an 
interrogation position and showing perfect complementarity 
with a reference sequence except for one or two helper 
5 mutations. The corresponding probes from the second, third 
and fourth probe sets usually comprise the same segment (or 
sometimes a subsequence thereof including the helper 
mutation (s) and interrogation position), except that the base 
occupying the interrogation position varies in each probe. 
10 See Fig. 9. 

Usually, the helper mutation tiling strategy is used 
in conjunction with one of the tiling strategies described 
above. The probes containing helper mutations are used to 
tile regions of a reference sequence otherwise giving low 
hybridization signal (e.g., because of sel f -complementarity) , 
and the alternative tiling strategy is used to tile 
intervening regions. 
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8. Bridging Strategy 

Probes that contain partial matches to two separate 

20 (i.e., non contiguous) subsequences of a target sequence 
sometimes hybridize strongly to the target sequence. In 
certain instances, such probes have generated stronger signals 
than probes of the same length which are perfect matches to 
the target sequence. It is believed (but not necessary to the 

25 invention) that this observation results from interactions of 
a single target sequence with two or more probes 
simultaneously. This invention exploits this observation to 
provide arrays of probes having at least first and second 
segments, which are respectively complementary to first and 

30 second subsequences of a reference sequence. Optionally, the 
probes may have a third or more complementary segments. These 
probes can be employed in any of the strategies noted above. 
The two segments of such a probe can be complementary to 
disjoint subsequences of the reference sequences or contiguous 

35 subsequences. If the latter, the two segments in the probe 
are inverted relative to the order of the complement of the 
reference sequence. The two subsequences of the reference 
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sequence each typically comprises about 3 to 3 0 contiguous 
nucleotides. The subsequences of the reference sequence are 
sometimes separated by 0, 1, 2 or 3 bases. Often the 
sequences, are adjacent and nonoverlapping . 

For example, a wildtype probe is created by 
complementing two sections of a reference sequence (indicated 
by subscript and superscript) and reversing their order. The 
interrogation position is designated (*) and is apparent from 
comparison of the structure of the wildtype probe with the 
three mismatched probes. The corresponding nucleotide in the 
reference sequence is the "a" in the superscripted segment. 

Reference: 5' T GGCTA CGAGG AATCATCTGTTA 

Probes: 3' GCTCC CCGAT (Probe from first probe set) 

3" GCACC CCGAT 
3' GCCCC CCGAT 
3 f GCGCC CCGAT 

The expected hybridizations are: 
Match: 

GCTCCCCGAT 

. . . TGGCTACGAGGAATCATCTGTTA 
GCTCC CCGAT 

Mismatch: 

GCTCC CCGAT 

. . . TGGCTACGAGGAATCATCTGTTA 
GCGCC CCGAT 

Bridge tilings are specified using a notation which 
gives the length of the two constituent segments and the 
relative position of the interrogation position. The 
designation n/m indicates a segment complementary to a region 
of the reference sequence which extends for n bases and is 
located such that the interrogation position is in the mth 
base from the 5' end. If m is larger than n, this indicates 
that the entire segment is to the 5' side of the interrogation 
position. If m is negative, it indicates that "the 
interrogation position is the absolute value of m bases 5' of 
the first base of the segment (m cannot be zero) . Probes 
comprising multiple segments, such as n/m + a/b + ... have a 
first segment at the 3» end of the probe and additional 
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segments added 5' with respect to the first segment. For 
example, a 4/8 tiling consists of (from the 3' end of the 
probe) a 4 base complementary segment, starting 7 bases 5' of 
the interrogation position, followed by a 6 base region in 
which the interrogation position is located at the third base. 
Between these two segments, one base from the reference 
sequence is omitted. By this notation, the set shown above is 
a 5/3 + 5/8 tiling. Many different tilings are possible with 
this method, since the lengths of both segments can be varied, 
as well as their relative position (they may be in either 
order and there may be a gap between them) and their location 
relative to the interrogation position. 

As an example, a 16 mer oligo target was hybridized 
to a chip containing all 4 10 probes of length 10. The chip 
5 includes short tilings of both standard and bridging types. 
The data from a standard 10/5 tiling was compared to data from 
a 5/3 + 5/8 bridge tiling (see Table 1) . Probe intensities 
(mean count /pixel) are displayed along with discrimination 
ratios (correct probe intensity / highest incorrect probe 
3 intensity). Missing intensity values are less than 50 counts. 
Note that for each base displayed the bridge tiling has a 
higher discrimination value. 

TABLE 1: Comparison of Standard and Bridge Tilings 

TILING PROBE BASE: CORRECT PROBE BASE 
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The bridging strategy offers the following 
advantages : 
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(1) Higher discrimination between matched and mismatched 
probes, 

(2) The possibility of using longer probes in a bridging 
tiling, thereby increasing the specificity of the 

5 hybridization, without sacrificing discrimination, 

(3) The use of probes in which an interrogation position 
is located very off-center relative to the regions of target 
complementarity. This may be of particular advantage when, 
for example, when a probe centered about one region of the 

10 target gives low hybridization signal. The low signal is 
overcome by using a probe centered about an adjoining region 
giving a higher hybridization signal. 

(4) Disruption of secondary structure that might result 
in annealing of certain probes (see previous discussion of 

15 helper mutations) . 

9. Deletion Tiling 

Deletion tiling is related to both the bridging and 
helper mutant strategies described above. In the deletion 
strategy, comparisons are performed between probes sharing a 
common deletion but differing from each other at an 
interrogation position located outside the deletion. For 
example, a first probe comprises first and second segments, 
each exactly complementary to respective first and second 
subsequences of a reference sequence, wherein the first and 
25 second subsequences of the reference sequence are separated by 
a short distance (e.g., l or 2 nucleotides). The order of the 
first and second segments in the probe is usually the same as 
that of the complement to the first and second subsequences in 
the reference sequence. The interrogation position is usually 
30 separated from The comparison is performed with three other 
probes, which are identical to the first probe except at an 
interrogation position, which is different in each probe. 
Reference : . . . AGTACCAGATCTCTAA . . . 

Probe set : CATGGNC AGAGA (N = interrogation position) . 

Such tilings sometimes offer superior discrimination 
in hybridization intensities between the probe having an 
interrogation position complementary to the target and other 
probes. Thermodynamically, the difference between the 
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hybridizations to matched and mismatched targets for the probe 
set shown above is the difference between a single-base bulge, 
and a large asymmetric loop (e.g., two bases of target, one of 
probe) . This often results in a larger difference in 
stability than the comparison of a perfectly matched probe 
with a probe showing a single base mismatch in the basic 
tiling strategy. 

The superior discrimination offered by deletion 
tiling is illustrated by Table 2, which compares hybridization 
data from a standard 10/5 tiling with a (4/8 + 6/3) deletion 
tiling of the reference sequence. (The numerators indicate 
the length of the segments and the denominators, the spacing 
of the deletion from the far termini of the segments . ) Probe 
intensities (mean count/pixel) are displayed along with 
discrimination ratios (correct probe intensity / highest 
incorrect probe intensity) . Note that for each base displayed 
the deletion tiling has a higher discrimination value than 
either standard tiling shown. 
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The use of deletion or bridging probes is quite 
general. These probes can be used in any of the tiling 
strategies of the invention. As well as offering superior 
discrimination, the use of deletion or bridging strategies is 
advantageous for certain probes to avoid self -hybridization 
(either within a probe or between two probes of the same 
sequence) . 



10 



20 



10. Nucleotide Repeats 

Recently a new form of human mutation, expansion of 
trinucleotide repeats, has been found to cause the diseases of 
fragile X-syndrome, spinal and bulbar atrophy, myotonic 
dystrophy and Huntington's disease. See Ross et al . , TINS 16, 
254-259 (1993) . Long lengths of trinucleotide repeats are 
associated with the mutant form of a gene. The longer the 
15 length, the more severe the consequences of the mutation and 
the earlier the age of onset. The invention provides arrays 
and methods for analyzing the length of such repeats. 

The different probes in such an array comprise 
different numbers of repeats of the complement of the 
trinucleotide repeat of interest. For example, one probe 
might be a t rimer, having one copy of the repeat, a second 
probe might be a sixmer, having two copies of the repeat, and 
a third probe might be a ninmer having three copies, and so 
forth. The largest probes can have up to about sixty bases or 
25 20 trinucleotide repeats. 

The hybridization signal of such probes to a target 
of trinucleotide repeats is related to the length of the 
target. It has been found that on increasing the target size 
up to about the length of the probe, the hybridization signal 
shows a relatively large increase for each complete 
trinucleotide repeat unit in the target, and a small increase 
for each additional base in the target that does not complete 
a trinucleotide repeat. Thus, for example, the hybridization 
signals for different target sizes to a 20 mer probe show 
35 small increases as the target size is increased from 6-8 
nucleotides and a larger increase as the target size is 
increased to 9 nucleotides. 
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Arrays of probes having different numbers of repeats 
are usually calibrated using known amounts of target of 
different length. For each target of known length, the 
hybridization intensity is recorded for each probe. Thus, 
5 each target size is defined by the relative hybridization 

signals of a series of probes of different lengths. The array 
is then hybridized to an unknown target sequence and the 
relative hybridization signals of the different sized probes 
are determined. Comparison of the relative hybridization 

10 intensity profile for different probes with comparable data 
for targets of known size allows interpolation of the size of 
the unknown target. Optionally, hybridization of the unknown 
target is performed simultaneously with hybridization of a 
target of known size labelled with a different color. 

15 C. Preparation of Target Samp les 

The target polynucleotide, whose sequence is to be 
determined, is usually isolated from a tissue sample. If the 
target is genomic, the sample may be from any tissue (except 
exclusively red blood cells) . For example, whole blood, 

20 peripheral blood lymphocytes or PBMC, skin, hair or semen are 
convenient sources of clinical samples. These sources are 
also suitable if the target is RNA. Blood and other body 
fluids are also a convenient source for isolating viral 
nucleic acids. If the target is mRNA, the sample is obtained 

25 from a tissue in which the mRNA is expressed. If the 

polynucleotide in the sample is RNA, it is usually reverse 
transcribed to DNA. DNA samples or cDNA resulting from 
reverse transcription are usually amplified, e.g., by PCR. 
Depending on the selection of primers and amplifying 

30 enzyme (s), the amplification product can be RNA or DNA. 

Paired primers are selected to flank the borders of a target 
polynucleotide of interest. More than one target can be 
simultaneously amplified by multiplex PCR in which multiple 
paired primers are employed. The target can be labelled at 

35 one or more nucleotides during or after amplification. For 
some target polynucleotides (depending on size of sample) , 
e.g., episomal DNA, sufficient DNA is present in the tissue 
sample to dispense with the amplification step. 
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When the target strand is prepared in single-stranded 
form as in preparation of target RNA, the sense of the strand 
should of course be complementary to that of the probes on the 
chip. This is achieved by appropriate selection of primers. 
5 The target is preferably fragmented before application to the 
chip to reduce or eliminate the formation of secondary 
structures in the target. The average size of targets 
segments following hybridization is usually larger than the 
size of probe on the chip. 

10 II. Biotransformation Gene Chips 
A. Biotransformation Genes 

Biotransformation genes tiled by the invention include 
any of the 481 known cytochrome P450 genes, particularly, the 
human P4 50 genes (see Nebert, DNA & Cell Biol. 10, 1-14 

15 (1991); Nelson et al . , Pharmacogenetics 6, 1-42 (1996), 

acetylase genes, monoamine oxidase genes, and genes known to 
specifically biotransform particular drugs, such as the gene 
encoding glucuronidase that participates in the pathway by 
which codeine or morphine are converted to active form. Paul 

20 et al., J. Pharm. Exp. Ther. 251, 477 (1989). Other genes of 
particular interest include P450 2D6, P450 2C19, N-acetyl 
transferase II, glucose 6 -phosphate dehydrogenase, 
pseudocholinesterase, catechol -0-methyl transferase, 
thiopurine methyltransf erase and dihydropyridine 

25 dehydrogenase. cDNA and at least partial genomic DNA 

sequences are available for these genes, e.g., from data bases 
such as GenBank and EMBL (see Table 3) . 



TABLE 3; ACCESSION NUMBER CYP LIST 



GENE 


ACCESSION 
NUMBER (S) 


* 

IMPORTANCE 








CYP1A1 


D12525 


Cancer Susceptibility 




D01198 










CYP1A2 


M31664 
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TABLE 3: ACCESSION NUMBER CYP LIST 

(continued) 



GENE 


ACCESSION 
NUMBER fS) 


IMPORTANCE 




1 V U J.DD J 






i rrr 
no X D D D 






Ml 1 ££1 
l v l J XOD / 






TT09 QQ7 










V— J- XT /Ut\ 


VI O 0 0*7 










PYP? Al 


7*11 Q 
HOOOlo 


Loumarin 7-nydroxylation 




l"j ooib 










PYP9 A4 


A J. J j7 o U 










PYP9PR 
Vm> X XT z, v_ o 


Aj^O V / 


















M£1 ftcrcr 


wariarm Metabolism 










nolo j / 






U U J jZQ 






T.I fift77 
XjX d o / / 












11D lOJO 






uUbozo 










CYP2C18 


M61853 


Drug Metabolism 




J05326 






M61856 










CYP2C19 


L07093 


S-mephenytoin 4 -hydroxylase 




M61854 
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TABLE 3 : ACCESSION NUMBER CYP LIST 

(continued) 



5 



GENE 


ACCESSION 
NUMBER iS) 


J-rxf \Jzt X AIM \*t!t 




U U J J ^ u 






1 IX U O ~J X 










CYP2D6 


M2040"} 


jjeorisoqum/ sparteine Polymorphism 




M19697 






M24499 






AlOOOO 






AjOtD / 


L-ifzij/F pseuaogene 




A.D OtDO 


uifzjjoF pseuaogene 








CYP2E1 

X X £t J_J X 




iiLiiaiiOi mauciDie 




J02843 










CYP3A4 


D11131 


Polymorphic Drug Metabolism # 




M14096 










CYP4F2 


U02388 


Leukotriene B4 omega hydroxylase 








NAT2 


U23052 


Drug Acetyl at ion/Drug Induced 
Disease 




U23434 










TPMT 


U11424 


Thiopurine Methyl Transferase- 
transplantation and childhood 
leukemia 




U12387 





Additional genomic sequence flanking the ^regions already 
sequenced are easily determined by PCR-based gene walking. 
See Parker et al., Nucl. Acids Res. 19:3055-3060. A specific 
10 primer for the sequenced region is primed with a general 
primer that hybridizes to the flanking region. 
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The CYP2D6 enzyme has debrisoquine oxidase activity. See 
e.g., Kimura et al Am. J. Human. Genet. 45, 889-904 (1989). 

Several therapeutically important compounds are 
metabolized by CYP2D6. The list includes cardioactive drugs: 
5 0-blockers (bufuralol, propranolol, metoprolol, timolol) and 
antiarrhythmics (sparteine, encainide, flecainide, mexiletine) 
(Buchert & Woosley, Pharmacogenetics 2, 2-11 (1992); 
Birgersdotter et al., Brit. J. Clin. Pharmacol. 33, 275-280 
(1992)); psychoactive drugs including tricyclic 
10 antidepressants (imipramine, desipramine, nortriptyline) and 
antipsychotics (clozapine and haloperidol) (Dahl & Bertilsson, 
Pharmacogenetics 3, 61-70 (1993); Fischer et al., J. 
Pharmacol. Exp. Ther. 260, 1355-1360 (1992); Lerena et al . , 
Drug Monitor 14, 92-97 (1992)); as well as a variety of other 
15 commonly used drugs including codeine and dextromethorphan 
(Eichelbaum & Gross, Pharmac. Ther. 46, 377-394 (1990)) as 
well as amphetamine, and cocaine. Ten percent of the general 
population is defective in P450 2D6, an enzyme that 
demethylates codeine at an earlier stage in the activation 
pathway, and therefore derives no analgesic benefit from 
codeine (see Sindrup & Brosen, Pharmacogenetics 5, 335-346 
(1995)). 

At least seven different polymorphic variants of the 
CYP2D6 gene demonstrating autosomal recessive inheritance are 
25 associated with a poor drug metabolizer phenotype (see Table 
4) . These alleles are designated CYP2D6A, CYP2D6B, CYP2D6C, 
CYP2D6D, CYP2D6E, CYP2D6F, and CYP2D6J (Gonzales & Idle, Clin. 
Pharmacokinet. 26 (1), 59-70 (1994); Nelson et al . , DNA & Cell 

* 

Biol. 12(1), 1-51 (1993)). CYP2D6A, CYP2D6E and CYP2D6F are 
30 minor variants of the wild type gene. CYP2D6A has a single 
nucleotide deletion in exon 5 with a consequent frame shift 

(Kagimoto et al., J. Biol. Chem. 265, 17209-17214 (1990)). 
CYP2D6E and CYP2D6F are rare, recently described variants 

(Gonzales & Idle, supra) . CYP2D6B accounts for about 70% of 
35 defective alleles. This variant has point mutations in exons 
1, 3, 8 and 9 as well as a base change at the third intron 
splice site that results in aberrant transcript splicing 

(Gonzales et al., Mature 331, 442-446 (1988); Kagimoto et al . , 



20 
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J. Biol. Chem. 265, 17209-17214 (1990)). CYP2D6C has a three 
base deletion in exon 5 (Broly and Meyer, Pharmacogenetics 3, 
123-130 (1993)) and, on the CYP2D6D allele, the entire 
functional gene is deleted although the pseudogenes remain 
5 intact (Gaedigk et al . , Am. J. Hum. Genet. 48, 943-950 

(1991)) . The CYP2D6J allele has base changes in both the 
first and ninth exons that result in amino acid changes 

(Yokota et al . , Pharmacogenetics 3, 256-263 (1993). The 
CYP2D6 gene clusters with other CYP2D genes on human 
10 chromosome 22. Also present in this region are two or three 
highly conserved pseudogenes. Of these, CYP2D7P (three 
variant forms) and CYP2D8P have been isolated and sequenced 

(Kimura et al . , supra; Helm & Meyer, supra) . 
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TABLE 4 



ALLELE 


(EXON) NUCLEOTIDE 
CHANGES 


XBAI 
HAPLOTYPE 


ENZYME 
ACTIVIT 
Y 


REF 
• 


CYP2D6-wt 




29 kb 




(97 

) 

(9) 










CYP2D6-LI 


(3) 1726 G -» C 
(6) 2938 C 
T/296 Arg Cys 
(9) 4268 G -» 
C/486 Ser -> Thr 
(6) 2938 C -» 
T/296 Ara Cvs 
(6) 2938 C 
T/296 Arg -» Cys 


29 kb 


NORMAL 


(12 

) 

(11 

) 

( 1 

) 


CYP2D6-A 


(5) 2637 AA 


29 kb 




(15 
) 


CYP2D6-B 


(4) 1934A ( + 6 
other mutations) 


29 kb 
44 kb 
9 + 16 kb 




(15 
) 


CYP2D6-D 


Deletion 


11.5 kb 
(13 kb) 


ABSENT 


(14 
) 


CYP2D6-E 


(6) 3023 A -> 
C/324 His Pro 


29 kb 




(99 
) 


CYP2D6- 
AT1795 


(3) 1795 AT / 

152 Try -> Gly 

153 Stop 


29 kb 




(98 
,10 
0) 


CYP2D6-C 


(5) 2703-5 AAAG 
/ 281 ALys 


29 kb 




(44 
) 

(10 
1) 


CYP2D6- J 


(1) 188 C -> 
T/Pro 34 Pro -> Ser 
(3) 1749 G -> C 
(9) 4268 G -> 
C/486 Ser -» Thr 


29 kb 
44 kb 




(16 
) 


CYP2D6-W 


(1) 188 C T/34 

Pro -* Ser 
(9) 4268 G ^ 
C/486 Ser -* Thr 


29 kb 
44 kb 


DECREAS 
ED 


(10 
2) 


CYP2D6- 
Chl 


(1) 188 C -> T/34 
. Pro -» Ser 

(2) 1127 C -» T 

(3) 1749 G -> C 
(9) 4268 G -* 
C/486 Ser Thr 


29 kb 
44 kb 




(10 
3) 
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TABLE 4 

(continued) 









ENZYME 






(EXON) NUCLEOTIDE 


XBAI 


ACTIVIT 


REF 


ALLELE 


CHANGES 


HAPLOTYPE 


Y 


• 


(CYP2D6- 


Amplification of 


175 kb 




(12 




D6-L 




INCREAS 


) 




42 kb 


ED 




(CYP2D6- 


Duplication of D6- 






(12 


L) 5 


L 






) 



5 Presently used trivial names of CYP2D6 alleles, summary 

of CYP2D6 -Alleles, haplotypes and their phenotypic 
consequences (modified from U.A. Meyer) . 
9) Kimura et al . Am J Hum Genet 45: 889-904 (1989) 
11) Armstrong et al . Hum Genet 91: 616-617 (1993) 
10 12) Johansson et al . PNAS 90: 11825-11829 (1993) 

13) Tsuneoka et al . J. Biochem Tokyo 114: 263-266 (1993) 

14) Gaedigk et al . Am J Hum Genet 48: 943-950 (1991) 

15) Kagimoto et al . J Biol Chem 265: 17209-17214 (1990) 

16) Yokota et al . Pharmacogenet 3: 256-263 (1993) 
15 44) Tyndale et al . Pharmacogenet 1: 26-32 (1991) 

97) Gonzales et al . Nature 331: 442-446 (1988) 

98) Evert et al . Pharmacogenet 4: 271-274 (1994) 

99) Evert et al . Naunyn-Schmiedebergs Arch Pharmacol 350: 
434-439 (1994) 

20 100) Saxena et al . Hum Mol Genet 3: 923-926 (1994) 

101) Broly et al . Pharmacogenet 3: 123-130 (1993) 

102) Wang et al . Clin Pharmacol Ther 53: 410-418 (1993) 

103) Johansson et al . Mol Pharmacol 46: 452-459 (1994) 

The 2C19 gene is the principal human determinant of S- 
25 mephenytoin hydroxylase. Drugs metabolized by this enzyme in 
addition to mephenytoin include antidepressants and 
neuroleptics. Variant alleles are described in de Morais et 
al., J. Biol. Chem. 269(22), 15419-15422 (1994); de Morais et 
al., Molecular Pharmacology 46, 594-598 (1994). Mutations are 
30 known to occur at nucleotides 636 (G-A) and 681 (G-A) of the 
coding sequence. 

CYP2E1 is responsible for metabolizing several 
anesthetics including ethanol . CYP2A6 metabolizes nicotine. 
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TABLE 4 

(continued) 

CYP2C9 metabolizes warfarin. A table showing other pairs of 
drugs and cytochromes P450 that either metabolize the drug or 
are inhibited by it appears below. 
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to be screened. The choice of primers also depends on the 
strand to be amplified. Because some nonallelic P450 genes 
show a high degree of sequence identity, selection of primers 
can be important in determining whether one or more nonallelic 
segments is amplified. Usually, primers will be selected to 
be perfectly complementary to a unique sequence within a 
selected target resulting in amplification of only that 
target. Examples of suitable primers are shown in Table 6 (F= 
forward primer, R= reverse primer) . 

TABLE 6 



SEQUENCE NAME 


SEQUENCE 


CYP2DE1F 


GCCAGGTGTGTCCAGAGGAGCCCAT 


CYP2DE1R 


CTGGTAGGGGAGCCTCAGCACCTCT 


CYP2DE2F 


TAGGACTAGGACCTGTAGTCTGGGGT 


CYP2DE2R 


GGTCCCACGGAAATCTGTCTCTGT 


CYP2DE34F 


CTAATGCCTTCATGGCCACGCGCA 


CYP2DE34R 


TCGGGAGCTCGCCCTGCAGAGA 


CYP2DE5F 


GGGCCTGAGACTTGTCCAGGTGAA 


CYP2DE5R 


CCCTCATTCCTCCTGGGACGCTCAA 


CYP2DE6F 


CCCGTTCTGTCCCGAGTATGCTCT 


CYP2DE6R 


TCGGCCCCTGCACTGTTTCCCAGA 


CYP2DE7F 


GCTGACCCATTGTGGGGACGCAT 


CYP2DE7R 


CTATCACCAGGTGCTGGTGCTGAGCT 


CYP2DE89F 


GGGAGACAAACCAGGACCTGCCAGA 


CYP2DE89R 


CTCAGCCTCAACGTACCCCTGTCT 


CYP2D678-F 


TGAGAGCAGCTTCAATGATGAGAACCT 


CYP2D678-R 


GTAGGATCATGAGCAGGAGGCCCCA 


CYP-PCR8-F 


TCCCCCGTGTGTTTGGTGGCA 


CYP-PCR9-R 


TGCTTTATTGTACATTAGAGC 



For analysis of mutants through all or much of a gene, it 
is often desirable to amplify several segments from several 
paired primers. The different segments may be amplified 
sequentially or simultaneously by multiplex PCR. Frequently, 
fifteen or more segments of a gene are simultaneously 
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amplified by PGR. The primers and amplifications conditions 
are preferably selected to generate f luorescently labelled DNA 
targets. Double stranded targets are enzymically degraded to 
fragments of about 100 bp and denatured before hybridization. 
5 D. Tiling Strategies 

Mutations in biotransformation genes can be detected by 
any of the tiling strategies noted above. For detection of 
hitherto uncharacterized mutations, the basic tiling strategy 
is one suitable strategy. The chips contain probes tiling 

10 across some or all of a reference sequence. 

For detecting precharacterized mutations, which account 
for the large majority of poor metabolizers in the preferred 
reference genes described above, the block tiling strategy is 
one particularly useful approach. In this strategy, a group 

15 (or block) of probes is used to analyze a short segment of 
contiguous nucleotides (e.g., 3, 5, 7 or 9) from a 
biotransformation gene centered around the site of a mutation. 

In a preferred embodiment, a first group of probes is 
tiled based on a wildtype reference sequence and a second 

20 group is tiled based a mutant version of the wildtype 

reference sequence. The mutation can be a point mutation, 
insertion or deletion or any combination of these. The 
presence of first and second groups of probes facilitates 
analysis when multiple target sequences are simultaneously 

25 applied to the chip, as is the case when a patient being 

diagnosed is heterozygous in a biotransformation gene. The 
principles of chip design and analysis are as described for 
the CFTR chip. 

E. Modifications for Determining Gene Copy Number 

30 The tiling arrays of the invention are usually capable of 

simultaneously analyzing heterozygous alleles of a target 
sequence. The presence of heterozygous alleles is signalled 
by two probes having interrogations positions aligned with the 
mutation showing specific hybridization, rather -than one, as 

35 would be the case for homozygous alleles. Interpretation of 
hybridization patterns is, however, sometimes complicated by 
the presence of less than, or more than, the two expected 
copies of a biotransformation gene in an individual. 
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For example, an individual having one wildtype copy of 
the gene, and a wholly deleted second copy of the gene would 
show a similar hybridization pattern to an individual with two 
wildtype copies (other than for possible differences in 

5 overall intensity of the pattern) . In fact, complete gene 
deletions of one or both copies of a gene account for 
approximately 15% of slow metabolizers having defective 
biotransformation enzymes. Analogous loss of heterozygosity 
occurs in other diseases such as cancer (p53) and muscular 

10 dystrophy (dystrophin gene) . 

Further, an individual with three wildtype copies of a 
biotransformation gene would show a similar hybridization 
pattern to an individual with two copies of the gene, other 
than for a difference in overall intensity. Individuals 

15 having multiple copies of a biotransformation gene are 

referred to as super metabolizers, because of their elevated 
levels of enzymes. 

Additional complications in interpreting a hybridization 
pattern can result from the presence of pseudogenes in an 

20 individual. A pseudogene is an analog of a true gene that 
shows strong sequence identity to the true gene but is not 
expressed. Most pseudogenes having counterparts among the 
biotransformation genes have been sufficiently well 
characterized that their presence can be avoided by 

25 appropriate selection of amplification primers (i.e., primers 
are selected that hybridize to the true gene of interest 
without hybridizing to the pseudogene). For example, 5' TGA 
GAG CAG CTT CAA TGA TGA GAA CCT 3' and 5' GTA GGA TCA TGA GCA 
GGA GGC CCC A3 1 , can be used for amplifying exon 6. However, 

30 occasionally a pseudogene might be unexpectedly amplified 
together with a true gene, and the presence of mutations in 
the psuedogene (which in fact have no phenotypic effect) might 
be mistakenly thought to occur in the true gene. 

The invention provides tiling arrays to overcome these 

3 5 difficulties by indicating how many copies of a target are 
present in a sample. In addition to containing the probes 
required for detecting polymorphism (s) associated with drug 
sensitive, disease-suscpetibility or other phenotype by the 
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tiling strategies described above, these arrays contain probes 
for analyzing polymorphic sites of a target gene, which, in 
general, do not exert any phenotypic effect (i.e., silent 
polymorphic sites) . The frequency and diversity of such sites 
5 is usually greater than that of mutations whose presence does 
exert a phenotypic effect. Silent sites are predominantly 
found in intronic regions and in flanking regions (i.e., 
within about 20 kb of transcribed regions) , where selective 
pressure is generally lower relative to the coding regions. 

10 Polymorphisms used to assess gene copy number should be on the 
same chromosome as the gene containing a phenotype -determining 
mutation of interest, and are often in the same gene or 
flanking sequences thereto. 

Any number of additional polymorphic sites can be tiled 

15 using the same strategies as previously described. For any 
particular polymorphic site, each form of the polymorphism at 
that sites serves as a reference sequence for a separate 
tiling. In some instances, silent polymorphic sites can be 
amplified from the same primers and on the same amplicon as 

20 the sites of potential mutations. In other instances, 
separate amplification is required. 

Silent polymorphic regions can be identified by comparing 
segments of target DNA, particularly introns and flanking 
regions, from different individuals. Comparison can be 

25 performed using the general tiling strategies disclosed above 
or by conventional techniques such as single-stranded 
conformational analysis. See, e.g., Hayashi, PCR Methods & 
Applications 1, 34-38 (1991); Orita, Proc. Natl. Acad. Sci. 
USA 86, 2766-2270 (1989); Orita et al . , Genomics 5, 874-879 

30 (1989) . This method has been successfully employed in 

dystrophin gene analysis coupled with heteroduplex formation 
to scan for new mutations. Prior et al . , Human Molecular 
Genetics 2 , 311-313 (1993). 

Analysis of the hybridization pattern of a probe array 

35 tiling a silent polymorphic region indicates which of the 
polymorphic forms are present at this region. Consider a 
polymorphism constituting a single base change. If the 
polymorphism and flanking sequences are tiled according to the 
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basic strategy using four probe sets, there are four probes 
having an interrogation position aligned with the single base 
at which the polymorphism occurs. The number of these four 
probes to show specific hybridization indicates the number of 
5 different polymorphic forms present, and hence, the minimum 
number of copies of a gene present. For example, if two 
probes show specific hybridization, at least two polymorphic 
forms are present . There may be more copies of the gene than 
polymorphic forms observed at any one site, because the same 

10 polymorphic form may be present in more than one copy of the 
gene. However, if sufficient polymorphic sites are examined, 
it is likely that a site will be found at which each copy of 
the gene exists in a different polymorphic form. The number 
of polymorphic sites that needs to be tested depends on the 

15 number of polymorphic forms and their relative frequencies at 
each sites. Typically, the number of sites varies from 1-100, 
with at least 5, 10, 20 or 50 sites being common. The copy 
number of a gene can be deduced from the number of polymorphic 
forms present at the polymorphic site(s) showing the greatest 

20 number of polymorphic forms. 

If a silent polymorphism is more complicated than a 
single-base change (e.g., deletion or insertion), the number 
of polymorphic forms can be determined from alternative 
tilings to the different forms, as generally described in 

25 §I.B.l. For example, if all the perfectly matched probes in a 
first tiling hybridize, it is concluded that the polymorphic 
form constituting the reference sequence for the first tiling 
is present. If, all the perfectly matched probes in two (or 
more) tilings hybridize, it is concluded that two (or more) 

30 polymorphic forms are present. 
F . Applications 

In general, the biotransformation genes described above 
are inherited in an autosomal recessive fashion. The presence 
of a homozygous mutation or two heterozygous mutations in an 
35 individual signals that the individual is a poor metabolizer 
of any drug metabolized by the biotransformation gene in which 
the mutation occurs. Some individuals with one mutant and one 
normal gene show a near wildtype phenotype, but other such 
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individuals show an intermediate phenotype between normal and 
homozygous mutant. Individuals having additional copies of a 
biotransformation gene usually express the gene product at 
higher levels than a wildtype individual. 
5 The screening methods can be routinely applied as 

precaution before administering a drug to a patient for the 
first time. If the patient is found to lack both copies of a 
gene expressing an enzyme required for detoxification of a 
particular drug, the patient generally should not be 

10 administered the drug or, should be administered the drug in 
smaller doses compared with patients having normal levels of 
the enzyme. The latter course may be necessary if no 
alternative treatment is available. If the patient is found 
to lack both copies of a gene expressing an enzyme required 

15 for activation of a particular drug, the drug will have no 
beneficial effect on the patient and should not be 
administered. Patients having one wildtype copy of a gene and 
one mutant copy of a gene, and who are at risk of having lower 
levels of an enzyme, should be administered drugs metabolized 

20 by that enzyme only with some caution, again depending on 

whether alternatives are available. If the drug is detoxified 
by the enzyme in question, the patient should in general be 
administered a lower dose of the drug. If the drug is 
activated by the enzyme, the heterozygous patient should be 

25 administered a higher dosage of the drug. The reverse applies 
for patients having additional copy(ies) of a particular 
biotransformation gene, who are at risk of having elevated 
levels of an enzyme. The more rational selection of 
therapeutic agents that can be made with the benefit of 

3 0 screening results in fewer side effects and greater drug 
efficacy in poor metabolizer patients. 

The methods are also useful for screening populations of 
patients who are to be used in a clinical trial of a new drug. 
The screening identifies a pool of patients, each of whom has 

35 wildtype levels of the full complement of biotransformation 
enzymes. The pool of patients are then used for determining 
safety and efficacy of the drugs. Drugs shown to be effective 
by such trials are formulated for therapeutic use with a 
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pharmaceutical carrier such as sterile distilled water, 
physiological saline, Ringer's solution, dextrose solution, 
and Hank's solution. 

The chips are also useful for screening patients for 
5 increased risk of cancer in similar manner to the p53 chips of 
the invention. Some biotransformation enzymes have roles in 
activating environmental procarcinogens to carcinogenic form 
(e.g., 1A1, 2D6, 2E1 and N-acetyltransf erase) . Mutations in 
genes encoding these enzymes are associated with reduced 

10 cancer risk. Other biotransformation enzymes have roles in 
detoxifying environmental carcinogens, e.g., glutathione S- 
transferase Ml. Mutations in one, and especially both, copies 
of genes encoding such enzymes are associated with enhanced 
susceptibility to cancer. See Shields, Environmental Health 

15 Perspectives 102 (sup. 11), 81-87 (1994). 

CYP genotype information can be useful to prevent drug- 
drug interactions in two main ways. First, some drugs are 
known to inhibit specific CYP enzymes. When such a drug is 
given, care should be taken not to give a second drug handled 

20 by the inhibited pathway (see Table 4) . Second, when a person 
is genotyped as a poor metabolizer, not only should drug doses 
be decreased, second drugs handled by the poor metabolizing 
pathway should not be added to the therapy. 

Example 

25 Pig. 10 shows the layout of probes and a computer- 

simulated hybridization pattern for an exemplary chip 
containing tilings for CYP2D6 and CYP2C19 (wildtype) . The 
chip contains a number of separate tilings as follows. 

(1) A tiling (basic strategy) of all 9 exons plus 5 

3 0 nucleotides of each intron bordering the exons of the CYP2D6 
gene. The probes were 14 mers with the interrogation position 
at nucleotide 7. This tiling is the upper right of the figure 
(excluding the eleven columns of probe sets on the left of the 
chip) . Each lane of probes is divided into four columns, 

35 occupied by probes differing at the interrogation position. 

At any one column, a nucleotide in the target sequence aligned 
with the column position is identified as the complement of 
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the nucleotide in the column having the highest fluorescent 
intensity. 

(2) A tiling (basic strategy) of the complete coding 
sequence (cDNA/mRNA) of CYP2C19 (wildtype) . The probes were 

5 15 mers with the interrogation position at nucleotide 7. This 
tiling is in the lower half of the figure (excluding the 
eleven columns of probes at the left of the figure) . 

(3) A series of "opti-block" tilings for analysis of 
known mutations in CYP2D6 and CYP2C19. These blocks run down 

10 the lefthand eleven columns of the figure. These blocks are 
labelled 2C19 m2 (mutation in cytochrome P450 2C19) , p34S, 
L91M, H94A, pl085, pll27, Delta T 1795, pl749, ss 1934, G212E, 
2637DeltaA, delta281, 296C, H324P, L421P, S486T, 2C19 ml 
(mutation in cytochrome P450 2C19) . Unless otherwise 

15 indicated, the mutations occur in cytochrome P450 2D6 . 

(4) A series of alternative tilings for analysis of 
known polymorphic differences between CYP2D6 and its 
pseudogenes CYP2D7P, CYP2D7AP and CYP2DAP . These tilings are 
also in the lefthand column of the figure. These tilings are 

20 labelled Ex6p 2D6/2D7, Ex2p 2D6/2D7, Ex2p 2D6/2D8, Ex4p 

2D6/2D7, Ex4p 2D6/2D8, Ex6p 2D6/2D8, Ex7p 2D6/2D7, and Exp 
2D6/2D7. 

Fig. 11 shows an alternative tiling designed to 
distinguish 2D6 from the pseudogene 2D7 in CYP2D6 . 

25 Alternative tilings are formed from two interdigitated 

tilings, each designed according to the basic tiling strategy 
based on two different reference sequences, in this case 2D6 
and 2D7. The first column contains four probes complementary 
to the CYP2D6 sequence except at the interrogation position. 

30 The second column contains four probes complementary to the 
CYP2D7 sequence except at the interrogation position. The 
interrogation positions of the first and second columns of 
probes align with the same positions of the target sequence. 
The same strategy of alternating probes from the- respective 

35 2D6 and 2D7 reference sequences continues throughout the 

alternative tiling. When the tiling is hybridized to only the 
CYP2D6 form, only probes complementary to CYP2D6 (i.e., the 
columns labelled 6) light up. Conversely when the tiling is 



WO 98/30883 PCT/US98/06414 

72 

hybridized to only the CYP2D7 form, only probes in the columns 
labelled 7 light up. When the tiling is hybridized to a 
mixture of CYP2D6 and CYP2D7 , the pattern is the sum of the 
pattern for the two individual forms. The characteristic 
5 patterns throughout the tiling allow distinction of whether 
CYP2D6, CYP2D7 or both are present. 

Fig. 12 shows an optiblock of probes for distinguishing 
the P34S mutation from the wildtype sequence of CYP2D6 . In an 
optiblock, probes are selected based on the block tiling 
10 strategy. That is all probes align with the same segment of 
target DNA but differ in the location of the interrogation 
position and in whether the probes are tiled based on a 
wildtype or mutant reference sequence. The notation "n" above 
the chip indicates that the interrogation position is aligned 
15 with the site of the P34S mutation in the target DNA and, the 
notation n-1 and n + 1 indication interrogation positions 
aligned one base either side of the site of mutation, and so 
forth. As in the alternate tiling, probes tiled on wildtype 
and mutant sequences (sometimes referred to as wildtype and 
20 mutant probes) are interdigitated. The result of hybridizing 
the optiblock to wildtype target is that all columns 
containing probes tiled based on the wildtype sequence light 
up. In addition, one column of probes based on the mutant 
sequence lights up, this being the column of probes having an 
25 interrogation position aligned with the "n" nucleotide in the 
target. The result of hybridizing the optiblock to the mutant 
target is the reverse; that is all columns of probes tiled 
based on the mutant target sequence light up, and a single 
column of probes tiled based on the wildtype sequence lights 
30 up. When the optiblock is hybridized to a heterozygous target 
containing wildtype and mutant forms, the pattern is the sum 
of those obtained with the individual targets alone. Thus, 
all three possible targets, homozygous wildtype, homozygous 
mutant and heterozygote give distinct patterns of 
35 hybridization and can be distinguished. 

The chip was hybridized with f luorescein-labelled-dGTP 
double -stranded DNA made by PCR from a plasmid template 
containing the genomic clone of CYP2D6-B. The entire gene is 
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amplified as 4 separate PCR products, all of which were, 
present during hybridization. dUTP was incorporated during 
PCR and the PCR products were treated with uracil DNA 
glycosylase, then heated to 95°C for 5 min before 
5 hybridization to fragment and denature double -stranded 

material. Hybridization was for 30 min at 37°C in 0 . 5 M LiCl 
plus 0.0005% NaLauroylSarkosine . Washing was performed prior 
to scanning the same solution without target DNA for 5 min at 
room temperature. 

10 Fig. 13 shows the chip hybridized to a CYP2D6-B target. 

A portion of the basic tiling pattern is shown magnified in 
the lower right hand corner. Successive nucleotides in the 
target sequence can be read by eye by comparing the sequence 
intensities of the four squares in a column. From top to 

15 bottom, these squares are respectively occupied by probes 
having A, C, G and T at the interrogation position. The 
nucleotide occupying the position in the target sequence 
aligned with the interrogation position of a column of probes 
is the complement of the interrogation position of the probe 

20 showing the highest signal. The SS1934 mutation in CYP2D6-B 
results in a G-A transition and loss of function. The 
enlarged hybridization pattern in the lower right of the 
figure has an arrow in the column corresponding to nucleotide 
1934. In this column, the probe hybridizing most strongly has 

25 a T in the interrogation position. This implies that the 

corresponding nucleotide in the target is the complement of T, 
i.e., A, indicating that the mutant form of the target is 
present. The same result is apparent from the optiblock shown 
in the upper left of the figure. This block shows three 

30 consecutive columns in which the T- probe lights up. Two of 
these columns are from wildtype and mutant probes having 
interrogation positions aligned with nucleotide 1934. The 
third column (the leftmost of the three) is the mutant probe 
having an interrogation position aligned with nucleotide 1933. 

35 Fig. 14 shows magnifications of the hybridization 

patterns of L421P and S486 opti-tiling blocks. In each case, 
the first, third, fifth, sixth, seventh, and ninth columns 
light up. This pattern indicates that homozygous wildtype 
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sequence is present (see the idealized pattern for homozygous 
wildtype in Fig. 12) . 

In a separate experiment, the chip was hybridized to 
CYP2C19 cDNA, as shown in Fig. 15. The Figure shows that the 
5 lower portion of the chip containing the 2C19 tiles is lit. A 
magnification of part of the hybridization pattern from the 
basic tiling sequence is shown in the upper right of the 
Figure. Again, the sequence can be read by eye by comparing 
the intensities of the four probes forming a column. 

10 III. MODES OF PRACTICING THE INVENTION 

A, SYNTHESIS OF ARRAYS 

Arrays of probe immobilized on supports can be 
synthesized by various methods. A preferred method is 
VLSIPS™ (see Fodor et al., Mature 364, 555-556 (1993); US 

15 5,143,854; EP 476,014; PCT/US94/12305) , which entails the use 
of light to direct the synthesis of oligonucleotide probes in 
high-density, miniaturized arrays. Algorithms for design of 
masks to reduce the number of synthesis cycles are described 
by Hubbel et al . , US 5,571,639 and US 5,593,839. Arrays can 

20 also be synthesized in a combinatorial fashion by delivering 
monomers to cells of a support by mechanically constrained 
flowpaths. See Winkler et al . , EP 624,059. Arrays can also 
be synthesized by spotting monomers reagents on to a support 
using an ink jet printer. See id.; Pease et al . , EP 728,520. 

25 B. PREPARATION OF LABELED DNA/ HYBRIDIZATION TO ARRAY 

1. PCR 

PCR amplification reactions are typically conducted 
in a mixture composed of, per reaction: 1 fil genomic DNA; 10 
/xl each primer (10 pmol//xl stocks) ; 10 /*1 10 x PCR buffer (100 

30 mM Tris.Cl pH8.5, 500 mM KCl, 15 mM MgCl 2 ) ; 10 /il 2 mM dNTPs 
(made from 100 mM dNTP stocks); 2.5 U Taq polymerase (Perkin 
Elmer AmpliTaq™, 5 U//xl) ; and H 2 0 to 100 /il . The cycling 
conditions are usually 40 cycles (94°C 45 sec, 55°C 30 sec, 
72 °C 60 sec) but may need to be varied considerably from 

35 sample type to sample type. These conditions are for 0.2 mL 
thin wall tubes in a Perkin Elmer 9600 thermocycler . See 
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Perkin Elmer 1992/93 catalogue for 9600 cycle time 
information. Target, primer length and sequence composition, 
among other factors, may also affect parameters. 

For products in the 200 to 1000 bp size range, check 
5 2 fil of the reaction on a 1.5% 0.5x TBE agarose gel using an 
appropriate size standard (phiX174 cut with Haelll is 
convenient) . The PCR reaction should yield several picomoles 
of product. It is helpful to include a negative control 
(i.e., 1 fil TE instead of genomic DNA) to check for possible 

10 contamination. To avoid contamination, keep PCR products from 
previous experiments away from later reactions, using filter 
tips as appropriate. Using a set of working solutions and 
storing master solutions separately is helpful, so long as one 
does not contaminate the master stock solutions. 

15 For simple amplifications of short fragments from 

genomic DNA it is, in general, unnecessary to optimize Mg 2+ 
concentrations. A good procedure is the following: make a 
master mix minus enzyme; dispense the genomic DNA samples to 
individual tubes or reaction wells; add enzyme to the master 

20 mix; and mix and dispense the master solution to each well, 
using a new filter tip each time. 

2. PURIFICATION 

Removal of unincorporated nucleotides and primers 
from PCR samples can be accomplished using the Promega Magic 

25 PCR Preps DNA purification kit. One can purify the whole 
sample, following the instructions supplied with the kit 
(proceed from section IIIB, 'Sample preparation for direct 
purification from PCR reactions'). After elution of the PCR 
product in 50 fil of TE or H 2 0, one centrifuges the eluate for 

30 20 sec at 12,000 rpm in a microfuge and carefully transfers 45 
fxl to a new microfuge tube, avoiding any visible pellet. Resin 
is sometimes carried over during the elution step. This 
transfer prevents accidental contamination of the linear 
amplification reaction with 'Magic PCR' resin. Other methods, 

35 e.g., size exclusion chromatography, may also be used. 

3. LINEAR AMPLIFICATION 

In a 0.2 mL thin-wall PCR tube mix: 4 fil purified 
PCR product; 2 fil primer (10 pmol//xl) ; 4 fil 10 x PCR buffer; 4 



WO 98/30883 PCT/US98/06414 

76 

fil dNTPs (2 mM dA, dC, dG # 0.1 mM dT) ; 4 /il 0.1 mM dUTP; 1 fil 
1 mM fluorescein dUTP (Amersham RPN 2121) ; 1 U Taq polymerase 
(Perkin Elmer, 5 U//xl) ; and add H20 to 40 fil . Conduct 40 
cycles (92°C 30 sec, 55°C 30 sec, 72°C 90 sec) of PCR. These 
5 conditions have been used to amplify a 3 00 nucleotide 
mitochondrial DNA fragment but are applicable to other 
fragments. Even in the absence of a visible product band on 
an agarose gel, there should still be enough product to give 
an easily detectable hybridization signal. If one is not 
10 treating the DNA with uracil DNA glycosylase (see Section 4) , 
dUTP can be omitted from the reaction. 

4. FRAGMENTATION 

Purify the linear amplification product using the 
Promega Magic PCR Preps DNA purification kit, as per Section 2 
15 above. In a 0.2 mL thin-wall PCR tube mix: 40 jil purified 
labeled DNA; 4 pi 10 x PCR buffer; and 0.5 /il uracil DNA 
glycosylase (BRL lU//il) . Incubate the mixture 15 min at 37°C, 
then 10 min at 97°C; store at -20°C until ready to use. 

5. HYBRIDIZATION, SCANNING & STRIPPING 

20 A blank scan of the slide in hybridization buffer 

only is helpful to check that the slide is ready for use. The 
buffer is removed from the flow cell and replaced with 1 ml of 
(fragmented) DNA in hybridization buffer and mixed well. 
Optionally, standard hybridization buffer can be 

25. supplemented with tetramethylammonium chloride (TMACL) or 

betaine (N,N,N-trimethylglycine; (CH 3 ) 3 N+CH 2 COO") to improve 
discrimination between perfectly matched targets and single- 
base mismatches. Betaine is zwitterionic at neutral pH and 
alters the composition-dependent stability of nucleic acids 

30 without altering their polyelectrolyte behavior. Betaine is 
preferably used at a concentration between 1 and 10 M and, 
optimally, at about 5 M. For example, 5 M betaine in 2x SSPE 
is suitable. Inclusion of betaine at this concentration 
lowers the average hybridization signal about four fold, but 

35 increases the discrimination between matched and mismatched 
probes . 

The scan is performed in the presence of the labeled 
target. Fig. 21 illustrates an illustrative detection system 
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for scanning a DNA chip. A series of scans at 3 0 min 
intervals using a hybridization temperature of 25°C yields a 
very clear signal, usually in at least 3 0 min to two hours, 
but it may be desirable to hybridize longer, i.e., overnight. 
5 Using a laser power of 50 /iW and 50 jim pixels, one should 
obtain maximum counts in the range of hundreds to low 
thousands /pixel for a new slide. When finished, the slide can 
be stripped using 50% formamide. rinsing well in deionized 
H 2 0, blowing dry, and storing at room temperature. 
10 C. PREPARATION OF LABELED RNA/ HYBRIDIZATION TO ARRAY 

1. TAGGED PRIMERS 

The primers used to amplify the target nucleic acid 
should have promoter sequences if one desires to produce RNA 
from the amplified nucleic acid. Suitable promoter sequences 
15 are shown below and include: 

(1) the T3 promoter sequence: 
5 1 - CGGAATTAACCCTCACTAAAGG 

5 ' - AATTAACCCTCACTAAAGGGAG ; 

(2) the T7 promoter sequence: 
20 5 1 TAATACGACTCACTATAGGGAG ; 

and (3) the SP6 promoter sequence: 
5 i ATTTAGGTGACACTATAGAA . 

The desired promoter sequence is added to the 5 ! end 
of the PCR primer. It is convenient to add a different 

25 promoter to each primer of a PCR primer pair so that either 
strand may be transcribed from a single PCR product. 

Synthesize PCR primers so as to leave the DMT group 
on. DMT -on purification is unnecessary for PCR but appears to 
be important for transcription. .Add 25 fil 0.5M NaOH to 

30 collection vial prior to collection of oligonucleotide to keep 
the DMT group on. Deprotect using standard chemistry 55°C 
overnight is convenient. 

HPLC purification is accomplished by drying down the 
oligonucleotides, resuspending in 1 mL 0.1 M TEAA (dilute 2.0 

35 M stock in deionized water, filter through 0.2 micron filter) 
and filter through 0.2 micron filter. Load 0.5 mL on reverse 
phase HPLC (column can be a Hamilton PRP-1 semi-prep, #79426) . 
The gradient is 0 -> 50% CH 3 CN over 25 min (program 0.2 
/imol .prep. 0-50, 25 min). Pool the desired fractions, dry 
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down, resuspend in 200 /il 80% HAc. 30 min RT. Add 200 /il 
EtOH; dry down. Resuspend in 2 00 /il H 2 0, plus 20 /il NaAc 
pH5.5, 600 /il EtOH. Leave 10 min on ice; centrifuge 12,000 
rpm for 10 min in microfuge. Pour off supernatant. Rinse 
5 pellet with 1 mL EtOH, dry, resuspend in 200 /il H20. Dry, 
resuspend in 200 /il TE. Measure A260, prepare a 10 pmol//il 
solution in TE (10 mM Tris.Cl pH 8.0, 0.1 mM EDTA) . Following 
HPLC purification of a 42 mer, a yield in the vicinity of 15 
nmol from a 0.2 /imol scale synthesis is typical. 

10 2. GENOMIC DNA PREPARATION 

Add 500 /il (10 mM Tris.Cl pH8.0, 10 mM EDTA, 100 mM 
NaCl, 2% (w/v) SDS, 40 mM DTT, filter sterilized) to the 
sample. Add 1.25 /il 20 mg/ml proteinase K (Boehringer) 
Incubate at 55°C for 2 hours, vortexing once or twice. 

15 Perform 2x 0.5 mL 1:1 phenol :CHC1 3 extractions. After each 
extraction, centrifuge 12,000 rpm 5 min in a microfuge and 
recover 0.4 mL supernatant. Add 35 /il NaAc pH5.2 plus 1 mL 
EtOH. Place sample on ice 45 min; then centrifuge 12,000 rpm 
30 min, rinse, air dry 30 min, and resuspend in 100 /il TE. 

20 3. PCR 

PCR is performed in a mixture containing, per 
reaction: 1 /xl genomic DNA; 4 /il each primer (10 pmol/ptl 
stocks); 4 /il 10 x PCR buffer (100 mM Tris.Cl pH8.5, 500 mM 
KC1, 15 mM MgCl 2 ) ; 4 fil 2 mM dNTPs (made from 100 mM dNTP 

25 stocks) ; 1 U Taq polymerase (Perkin Elmer, 5 U//tl) ; H 2 0 to 40 
/xl. About 40 cycles (94°C 30 sec, 55°C 30 sec, 72°C 30 sec) 
are performed, but cycling conditions may need to be varied. 
These conditions are for 0.2 mL thin wall tubes in Perkin 
Elmer 9600. For products in the 200 to 1000 bp size range, 

30 check 2 /xl of the reaction on a 1.5% O.SxTBE agarose gel using 
an appropriate size standard. For larger or smaller volumes 
(20 - 100 /xl) , one can use the same amount of genomic DNA but 
adjust the other ingredients accordingly. 

4. IN VITRO TRANSCRIPTION 

35 Mix: 3 fil PCR product; 4 /il 5x buffer; 2 /il DTT; 

2.4 /il 10 mM rNTPs (100 mM solutions from Pharmacia); 0.48 /il 
10 mM f luorescein-UTP (Fluorescein-12-UTP, 10 mM solution, 
from Boehringer Mannheim); 0.5 /il RNA polymerase (Promega T3 
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or T7 RNA polymerase); and add H 2 0 to 20 jil. Incubate at 37°C 
for 3 h. Check 2 pi of the reaction on a 1.5% 0.5xTBE agarose 
gel using a size standard. 5x buffer is 200 mM Tris pH 7.5, 
30 mM MgCl 2 , 10 mM spermidine, 50 mM NaCl, and 100 mM DTT 
5 (supplied with enzyme) . The PCR product needs no purification 
and can be added directly to the transcription mixture. A 20 
111 reaction is suggested for an initial test experiment and 
hybridization; a 100 /xl reaction is considered "preparative" 
scale (the reaction can be scaled up to obtain more target) . 

10 The amount of PCR product to add is variable; typically a PCR 
reaction will yield several picomoles of DNA. If the PCR 
reaction does not produce that much target, then one should 
increase the amount of DNA added to the transcription reaction 
(as well as optimize the PCR) . The ratio of f luorescein-UTP 

15 to UTP suggested above is 1:5, but ratios from 1:3 to 1:10 - 
all work well. One can also label with biotin-UTP and detect 
with streptavidin-FITC to obtain similar results as with 
f luorescein-UTP detection. 

For nondenaturing agarose gel electrophoresis of 

20 RNA, note that the RNA band will normally migrate somewhat 

faster than the DNA template band, although sometimes the two 
bands will comigrate. The temperature of the gel can effect 
the migration of the RNA band. The RNA produced from in vitro 
transcription is quite stable and can be stored for months (at 

25 least) at -20°C without any evidence of degradation. It can 
be stored in unsterilized 6xSSPE 0.1% triton X-100 at -20°C 
for days (at least) and reused twice (at least) for 
hybridization, without taking any special precautions in 
preparation or during use. RNase contamination should of 

30 course be avoided. When extracting RNA from cells, it is 

preferable to work very rapidly and to use strongly denaturing 
conditions. Avoid using glassware previously contaminated 
with RNases. Use of new disposable plasticware (not 
necessarily sterilized) is preferred, as new plastic tubes, 

35 tips, etc., are essentially RNase free. Treatment with DEPC 
or autoclaving is typically not necessary. 

5, FRAGMENTATION 
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Heat transcription mixture at 94 degrees for forty 
min. The extent of fragmentation is controlled by varying 
Mg 2+ concentration (30 mM is typical), temperature, and 
duration of heating. 

6. HYBRIDIZATION, SCANNING & STRIPPING 
A blank scan of the slide in hybridization buffer 
only is helpful to check that the slide is ready for use. The 
buffer is removed from the flow cell and replaced with 1 mL of 
(hydrolysed) RNA in hybridization buffer and mixed well. 
Incubate for 15-30 min at 18°C. Remove the hybridization 
solution, which can be saved for subsequent experiments. 
Rinse the flow cell 4-5 times with fresh changes of 6 x SSPE 
0.1% Triton X-100, equilibrated to 18°C. The rinses can be 
performed rapidly, but it is important to empty the flow cell 
before each new rinse and to mix the liquid in the cell 
thoroughly. A series of scans at 30 min intervals using a 
hybridization temperature of 25°C yields a very clear signal, 
usually in at least 30 min to two hours, but it may be 
desirable to hybridize longer, i.e., overnight. Using a laser 
power of 50 jiW and 50 ftm pixels, one should obtain maximum 
counts in the range of hundreds to low thousands/pixel for a 
new slide. When finished, the slide can be stripped using 
warm water. 

These conditions are illustrative and assume a probe 
length of "15 nucleotides. The stripping conditions suggested 
are fairly severe, but some signal may remain on the slide if 
the washing is not stringent. Nevertheless, the counts 
remaining after the wash should be very low in comparison to 
the signal in presence of target RNA. In some cases, much 
gentler stripping conditions are effective. The lower the 
hybridization temperature and the longer the duration of 
hybridization, the more difficult it is to strip the slide. 
Longer targets may be more difficult to strip than shorter 
targets. 

7 . AMPLIFICATION OF SIGNAL 

A variety of methods can be used to enhance 
detection of labelled targets bound to a probe on the array. 
In one embodiment, the protein MutS (from E. coli) or 
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equivalent proteins such as yeast MSH1, MSH2, and MSH3 ; mouse 
Rep-3, and Streptococcus Hex-A, is used in conjunction with 
target hybridization to detect probe-target complex that 
contain mismatched base pairs. The protein, labeled directly 
5 or indirectly, can be added to the chip during or after 

hybridization of target nucleic acid, and differentially binds 
to homo- and heteroduplex nucleic acid. A wide variety of 
dyes and other labels can be used for similar purposes. For 
instance, the dye YOYO-1 is known to bind preferentially to 

10 nucleic acids containing sequences comprising runs of 3 or 
more G residues. 

8, DETECTION OF REPEAT SEQUENCES 
In some circumstances, i.e., target nucleic acids 
with repeated sequences or with high G/C content, very long 

15 probes are sometimes required for optimal detection. In one 
embodiment for detecting specific sequences in a target 
nucleic acid with a DNA chip, repeat sequences are detected as 
follows. The chip comprises probes of length sufficient to 
extend into the repeat region varying distances from each end. 

20 The sample, prior to hybridization, is treated with a labelled 
oligonucleotide that is complementary to a repeat region but 
shorter than the full length of the repeat. The target 
nucleic is labelled with a second, distinct label. After 
hybridization, the chip is scanned for probes that have bound 

25 both the labelled target and the labelled oligonucleotide 

probe; the presence of such bound probes shows that at least 
two repeat sequences are present. 

While the foregoing invention has been described in 
some detail for purposes of clarity and understanding, it will 

30 be clear to one skilled in the art from a reading of this 

disclosure that various changes in form and detail can be made 
without departing from the true scope of the invention. All 
publications and patent documents cited in this application 
are incorporated by reference in their entirety for all 

35 purposes to the same extent as if each individual publication 
or patent document were so individually denoted. 
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WHAT IS CLAIMED IS: 

1 . A method of determining copy number of a gene present 
in an individual, comprising: 

analyzing a plurality of polymorphic sites in a 
5 chromosome containing a gene from an individual to determine 
the number of different polymorphic forms present at each 
site; and 

assigning the copy number of the gene as the highest 
number of polymorphic forms present at a single site. 

10 2. The method of claim 1, wherein the plurality of 

polymorphic sites are in a noncoding segment of the gene. 

3. The method of claim 1, wherein the plurality of 
polymorphic sites are silent polymorphisms. 

4. The method of claim 3, wherein the at least one 
15 polymorphic site is present in an intronic segment of the 

gene . 

5. The method of claim 1, wherein the pluralility of 
polymorphic sites comprises at least 10 sites. 

6. The method of claim 1, wherein the plurality of 
20 polymprhic sites comprises at least 50 sites. 



7. The method of claim 1, further comprising: 
obtaining a tissue sample from the individual containing 

the gene and amplifying the gene or a fragment thereof. 

8. The method of claim 1, wherein the analyzing 
25 comprises: 

contacting a nucleic acid comprising the gene or a 
fragment thereof with an array of oligonucleotides, the array 
comprising a plurality of subarrays, each subarray spanning a 
polymorphic site and complementarity to at least one 
30 polymorphic form of the gene at the site; 
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detecting hybridization intensities of the nucleic 
acid to the oligonucleotides in the array, whereby the pattern 
of hybridization indicates the number of polymorphic forms 
present at each polymorphic site. 

5 9. The method of claim 8, wherein the subarrays each 

comprise a plurality of probe groups, each probe group 
complementarity to a different polymorphic form at the site. 

10. The method of claim 9, wherein a probe group 
comprises 

10 (a) a first probe set comprising a plurality of probes 

spanning a polymorphic site of the gene, each probe comprising 
a segment of at least six nucleotides exactly complementary to 
a polymorphic form of the gene at the site, the segment 
including at least one interrogation position complementary to 

15 a corresponding nucleotide in the polymorphic form, 

(b) a second probe set comprising a corresponding probe 
for each probe in the first probe set, the corresponding probe 
in the second probe set being identical to a sequence 
comprising the corresponding probe from the first probe set or 

20 a subsequence of at least six nucleotides thereof that 

includes the at least one interrogation position, except that 
the at least one interrogation position is occupied by a 
different nucleotide in each of the two corresponding probes 
from the first and second probe sets. 

25 11. The method of claim 9, wherein a probe group 

comprises 

(a) a first probe set comprising a plurality of probes 
spanning a polymorphic site, each probe comprising a segment 
of at least six nucleotides exactly complementary to a 

3 0 subsequence of a polymorphic form at the site, the segment 

including at least one interrogation position complementary to 
a corresponding nucleotide in the polymorphic form, 

(b) second, third and fourth probe sets, each comprising 
a corresponding probe for each probe in the first probe set, 

35 the probes in the second, third and fourth probe sets being 
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identical to a sequence comprising the corresponding probe 
from the first probe set or a subsequence of at least six 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets. 

12. The method of claim 1, wherein a single polymorphic 
form is present at each of the plurality of sites and the copy 
number of the gene is assigned as 1. 

13. The method of claim 1, wherein two polymorphic forms 
are present at one site and a single polymorphic form is 
present at each other of the plurality of sites, and the copy 
number of the gene is assigned as 2. 

14. The method of claim 1, wherein three polymorphic 
forms are present at a first polymorphic site, a single 
polymorphic form is present at a second polymorphic site and 
two polymorphic forms are present at a third polymorphic site 
and the copy number of the gene is assigned as 3 . 

0 15. The method of claim 1, further comprising analyzing 

a phenotype -determining polymorphic site in the gene to 
determine which polymorphic form(s) are present at the site. 

16. The method of claim 15, further comprising 
diagnosing a phenotype of the patient based on the polymorphic 

5 form(s) present at the phenotype-determining polymorphic site. 

17. A method of analyzing a polymorphic site in a gene 
of an individual, comprising 

(a) hybridizing a sample comprising a target nucleic 
acid comprising one or more alleles of the gene to an array of 
0 oligonucleotide probes immobilized on a solid support, the 
array comprising: 
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(1) a first probe set comprising a plurality of 
probes, each probe comprising a segment of at least six 
nucleotides exactly complementary to a reference form of the 
gene, the segment including at least one interrogation 

5 position complementary to a corresponding nucleotide in the 
reference form of the gene, the reference form of the gene 
having a silent polymorphic site and a site of potential 
mutation associated with a phenotypic change; 

(2) second, third and fourth probe sets, each 

10 comprising a corresponding probe for each probe in the first 
probe set, the probes in the second, third and fourth probe 
sets being identical to a sequence comprising the 
corresponding probe from the first probe set or a subsequence 
of at least six nucleotides thereof that includes the at least 

15 one interrogation position, except that the at least one 

interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets; and 

(b) determining which probes, relative to one another, 
20 bind to the target nucleic acid, whereby the relative binding 
of probes having an interrogation position aligned with the 
silent polymorphism indicates the number of different alleles 
of the gene in the sample and the relative binding of probes 
having an interrogation position aligned with the mutation 
25 indicates whether the mutation is present in at least one of 
the alleles. 



18. The method of claim 17, wherein the determining 
comprises : 

(1) comparing the relative specific binding of four 
30 corresponding probes from the first, second, third and fourth 

probe sets; 

(2) assigning a nucleotide iri the target sequence 
as the complement of the interrogation position of the probe 
having the greatest specific binding; 

35 (3) repeating (1) and (2) until each nucleotide of 

interest in the target sequence has been assigned. 
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19. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least two sets of 
oligonucleotide probes, 

(1) a first probe set comprising a plurality of 
probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of a 
reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence, 

(2) a second probe set comprising a corresponding 
probe for each probe in the first probe set, the corresponding 
probe in the second probe set being identical to a sequence 
comprising the corresponding probe from the first probe set or 
a subsequence of at least three nucleotides thereof that 
includes the at least one interrogation position, except that 
the at least one interrogation position is occupied by a 
different nucleotide in each of the two corresponding probes 
from the first and second probe sets; 

wherein the probes in the first probe set have at 
least three interrogation positions respectively corresponding 
to each of three contiguous nucleotides in the reference 
sequence ; 

provided that the array does not contain a complete 
set of probes of a given length; 

wherein the reference sequence is from a 
biotransformation gene. 

20. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least four sets of 
oligonucleotide probes, 

(1) a first probe set comprising a plurality of 
probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of a 
reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence, 

(2) second, third and fourth probe sets, each 
comprising a corresponding probe for each probe in the first 
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probe set, the probes in the second, third and fourth probe 
sets being identical to a sequence comprising the 
corresponding probe from the first probe set or a subsequence 
of at least three nucleotides thereof that includes the at 
least one interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets; 

provided the array lacks a complete set of probes of 
a given length; 

wherein the reference sequence is from a 
biotransformation gene. 

21. The array of claim 19 or 20, wherein the reference 
sequence is from a gene encoding an enzyme selected from the 
group consisting of a cytochrome P450, N-acetyl transferase 
II, glucose 6 -phosphate dehydrogenase, pseudocholinesterase, 
catechol -0-methyl transferase, and dihydropyridine 
dehydrogenase . 

22. The array of claim 21, wherein the reference 
sequence is from a gene encoding an enzyme selected from the 
group consisting of a cytochrome P450, N-acetyl transferase 
II, glucose 6 -phosphate dehydrogenase, pseudocholinesterase, 
catechol -0-methyl transferase, and dihydropyridine 
dehydrogenase . 

23. The array of claim 22, wherein the enzyme is P450 
2D6 or P450 2C19. 

24. The array of claim 19 or 20, wherein the reference 
sequence includes a site of a mutation and a site of a silent 
polymorphism. 

25. The array of claim 24, wherein the silent 
polymorphism is in an intron or flanking region of a gene. 
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26. The array of claim 19 or 20, wherein the first probe 
set has at least 3 interrogation positions respectively 
corresponding to each of 3 contiguous nucleotides in the 
reference sequence. 

27. The oligonucleotide array of claim 19 or 20, wherein 
the array has between 100 and 100,000 probes. 

28. The oligonucleotide array of claim 19 or 20, wherein 
the probes are linked to the support via a spacer. 

29. The oligonucleotide array of claim 19 or 20, wherein 
the segment in each probe of the first probe set that is 
exactly complementary to the subsequence of the reference 
sequence is 9-21 nucleotides. 

30. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least one pair of first 
and second probe groups, each group comprising a first and 
second sets of oligonucleotide probes as defined by claim 19; 

wherein each probe in the first probe set from the 
first group is exactly complementary to a subsequence of a 
first reference sequence and each probe in the first probe set 
from the second group is exactly complementary to a 
subsequence from a second reference sequence. 

31. The array of claim 30, wherein each group further 
comprises third and fourth probe sets, each comprising a 
corresponding probe for each probe in the first probe set, the 
probes in the second, third and fourth probe sets being 
identical to a sequence comprising the corresponding probe 
from the first probe set or a subsequence of at least three 
nucleotides thereof that includes the interrogation position, 
except that the interrogation position is occupied by a 
different nucleotide in each of the four corresponding probes 
from the four probe sets. 
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32. The array of claim 30, wherein the first reference 
sequence includes the site of a mutation in the 
biotransformation gene, and the second reference sequence 
includes a site of a silent polymorphism within the 
biotransformation gene or flanking the biotransformation gene. 

33. The array of claim 32, wherein the reference 
sequence is from a gene encoding an enzyme selected from the 
group consisting of a cytochrome P450, N-acetyl transferase 
II, glucose 6 -phosphate dehydrogenase, pseudocholinesterase, 
catechol -0-methyl transferase, and dihydropyridine 
dehydrogenase . 

34. The array of claim 32 that comprises at least 
forty pairs of first and second probe groups, wherein the 
probes in the first probe sets from the first groups of the 
forty pairs are exactly complementary to subsequences from 
forty respective first reference sequences. 

35. A block of oligonucleotide probes immobilized on a 
solid support, comprising: 

a perfectly matched probe comprising a segment of at 
least three nucleotides exactly complementary to a subsequence 
of a reference sequence, the segment having a plurality of 
interrogation positions respectively corresponding to a 
plurality of nucleotides in the reference sequence, 

for each interrogation position, three mismatched probes, 
each identical to a sequence comprising the perfectly matched 
probe or a subsequence of at least three nucleotides thereof 
including the plurality of interrogation positions, except in 
the interrogation position, which is occupied by a different 
nucleotide in each of the three mismatched probes and the 
perfectly matched probe; 

provided the array lacks a complete set of probes of a 
given length; 

wherein the reference sequence is from a 
biotransformation gene. 
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36. The array of claim 34, wherein the segment of the 
perfectly matched probe comprises 3-20 interrogation positions 
corresponding to 3-20 respective nucleotides in the reference 
sequence . 

37. An array of probes immobilized to a solid support 
comprising at least two blocks of probes, each block as 
defined by claim 34, a first block comprising a perfectly 
matched probe comprising a segment exactly complementary to a 
subsequence of a first reference sequence and a second block 
comprising a perfectly matched probe comprising a segment 
exactly complementary to a subsequence of a second reference 
sequence . 



38. The array of claim 37, wherein the first reference 
sequence is from a wildtype 2D6 gene and the second reference 
sequence is from a mutant 2D6 gene. 

39. The array of claim 37, comprising at least 10-100 
blocks of probes, each comprising a perfectly matched probe 
comprising a segment exactly complementary to a subsequence of 
at least 10-100 respective reference sequences. 

40. An array of oligonucleotide probes immobilized on a 
solid support, the array comprising at least four probes: 

a first probe comprising first and second segments, each 
of at least three nucleotides and exactly complementary to 
first and second subsequences of a reference sequence, the 
segments including at least one interrogation position 
corresponding to a nucleotide in the reference sequence, 
wherein either (1) the first and second subsequences are 
noncontiguous, or (2) the first and second subsequences are 
contiguous and the first and second segments are inverted 
relative to the complement of the first and second 
subsequences in the reference sequence; 

second, third and fourth probes, identical to a sequence 
comprising the first probe or a subsequence thereof comprising 
at least three nucleotides from each of the first and second 
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which differs in each of the probes; 

provided the array lacks a complete set of probes of a 
given length; 

wherein the reference sequence is from a 
biotransformation gene. 

41. A method of comparing a target nucleic acid with a 
reference sequence comprising a predetermined sequence of 
nucleotides, the method comprising: 

(a) hybridizing a sample comprising the target 
nucleic acid to an array of oligonucleotide probes immobilized 
on a solid support, the array comprising: 

(1) a first probe set comprising a plurality of 
probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of the 
reference sequence, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence, wherein the reference 
sequence is from a biotransformation gene; 

(2) a second probe set comprising a corresponding 
probe for each probe in the first probe set, the corresponding 
probe in the second probe set being identical to a sequence 
comprising the corresponding probe from the first probe set or 
a subsequence of at least three nucleotides thereof that 
includes the at least one interrogation position, except that 
the at least one interrogation position is occupied by a 
different nucleotide in each of the two corresponding probes 
from the first and second probe sets; 

wherein, the probes in the first probe set have at 
least three interrogation positions respectively corresponding 
to each of at least three nucleotides in the reference 
sequence, and 

(b) determining which probes, relative to one another, in 
the first and second probe sets specifically bind to the 
target nucleic acid, the relative specific binding of 
corresponding probes in the first and second probe sets 
indicating whether a nucleotide in the target sequence is the 
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same or different from the corresponding nucleotide in the 
reference sequence . 

42. The method of claim 41, wherein the determining step 
comprises : 

(1) comparing the relative specific binding of two 
corresponding probes from the first and second probe sets; 

(2) assigning a nucleotide in the target sequence 
as the complement of the interrogation position of the probe 
having the greater specific binding; and 

(3) repeating (1) and (2) until each nucleotide of 
interest in the target sequence has been assigned. 

43. The method of claim 41, wherein the array further 
comprises third and fourth probe sets, each comprising a 
corresponding probe for each probe in the first probe set, the 
probes in the second, third and fourth probe sets being 
identical to a sequence comprising the corresponding probe 
from the first probe set or a subsequence of at least three 
nucleotides thereof that includes the at least one 
interrogation position, except that the at least one 
interrogation position is occupied by a different nucleotide 
in each of the four corresponding probes from the four probe 
sets; and the determining step comprises determining which 
probes, relative to one another, in the first, second, third 
and fourth probe sets specifically bind to the target nucleic 
acid, the relative specific binding of corresponding probes in 
the first, second, third and fourth probe sets indicating 
whether a nucleotide in the target sequence is the same or 
different from the corresponding nucleotide in the reference 
sequence . 

44. The method of claim 43, wherein: 

the reference sequence includes a site" of a mutation 
in the biotransformation gene and a silent polymorphism in or 
flanking the biotransformation gene; 

the target nucleic acid comprises one or more 
different alleles of the biotransformation gene; and 
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the relative specific binding of probes having an 
interrogation position aligned with the silent polymorphism 
indicates the number of different alleles and the relative 
specific binding of probes having an interrogation position 
aligned with the mutation indicates whether the mutation is 
present in at least one of the alleles. 



45. The method of claim 43, wherein the determining 
comprises: 

(1) comparing the relative specific binding of four 
corresponding probes from the first, second, third and fourth 
probe sets; 

(2) assigning a nucleotide in the target sequence 
as the complement of the interrogation position of the probe 
having the greatest specific binding; 

(3) repeating (1) and (2) until each nucleotide of 
interest in the target sequence has been assigned. 

46. A method of comparing a target nucleic acid with a 
reference sequence comprising a predetermined sequence of 
nucleotides, the method comprising: 

(a) hybridizing the target nucleic acid to an array 
of oligonucleotide probes immobilized on a solid support, the 
array comprising: 

a perfectly matched probe comprising a segment of at 
least three nucleotides exactly complementary to a subsequence 
of a reference sequence, the segment having a plurality of 
interrogation positions respectively corresponding to a 
plurality of nucleotides in the reference sequence, wherein 
the reference sequence is from a biotransformation gene; 

for each interrogation position, three mismatched probes, 
each identical to a sequence comprising the perfectly matched 
probe or a subsequence of at least three nucleotides thereof 
including the plurality of interrogation positions, except in 
the interrogation position, which is occupied by a different 
nucleotide in each of the three mismatched probes and the 
perfectly matched probe; 

(b) for each interrogation position, 
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(1) comparing the relative specific binding of the 
three mismatched probes and the perfectly matched probe; 

(2) assigning a nucleotide in the target sequence 
as the complement of the interrogation position of the probe 
having the greatest specific binding. 

47. The method of claim 46, wherein the target sequence 
has an undetermined substitution relative to the reference 
sequence, and the method assigns a nucleotide to the 
substitution. 

48. A method of screening a patient for capacity to 
metabolize a drug, the method comprising: 

(a) hybridizing a tissue sample from the patient 
containing a target nucleic acid to an array of 
oligonucleotide probes immobilized on a solid support, the 
array comprising: 

(1) a first probe set comprising a plurality 
of probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of the 
reference sequence from a biotransformation gene which 
metabolizes the drug, the segment including at least one 
interrogation position complementary to a corresponding 
nucleotide in the reference sequence, 

(2) a second probe set comprising a 
corresponding probe for each probe in the first probe set, the 
corresponding probe in the second probe set being identical to 
a sequence comprising the corresponding probe from the first 
probe set or a subsequence of at least three nucleotides 
thereof that includes the at least one interrogation position, 
except that the at least one interrogation position is 
occupied by a different nucleotide in each of the two 
corresponding probes from the first and second probe sets; 

wherein, the probes in the first probe set 
have at least three interrogation positions respectively 
corresponding to each of at least three nucleotides in the 
reference sequence, and 
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(b) determining which probes, relative to one 
another, in the first and second probe sets specifically to 
the target nucleic acid, the relative specific binding of 
corresponding probes in the first and second probe sets 
indicating whether the target sequence contains a mutation 
relative to the reference sequence, which, if present, impairs 
the capacity of the patient to metabolize the drug. 

49. A method of conducting a clinical trial on a drug, 
the method comprising: 

(a) obtaining a tissue sample containing a target 
nucleic acid from each of a pool of patients; 

(b) for each tissue sample, hybridizing the target 
nucleic acid to an array of oligonucleotide probes immobilized 
on a solid support, the array comprising: 

(1) a first probe set comprising a plurality 
of probes, each probe comprising a segment of at least three 
nucleotides exactly complementary to a subsequence of the 
reference sequence from a biotransformation gene, the segment 
including at least one interrogation position complementary to 
a corresponding nucleotide in the reference sequence, 

(2) a second probe set comprising a 
corresponding probe for each probe in the first probe set, the 
corresponding probe in the second probe set being identical to 
a sequence comprising the corresponding probe from the first 
probe set or a subsequence of at least three nucleotides 
thereof that includes the at least one interrogation position, 
except that the at least one interrogation position is 
occupied by a different nucleotide in each of the two 
corresponding probes from the first and second probe sets; 

wherein, the probes in the first probe set 
have at least three interrogation positions respectively 
corresponding to each of at least three nucleotides in the 
reference sequence; 

(c) determining which probes, relative to one 
another, in the first and second probe sets specifically to 
the target nucleic acid, the relative specific binding of 
corresponding probes in the first and second probe sets 
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indicating whether the target sequence contains a mutation 
relative to the reference sequence 

selecting a subpool of patients having a target sequence 
free of the mutation; and 

(d) administering the drug to the subpool of 
patients to determine efficacy. 

50. The method of claim 49, further comprising combining 
the drug with a pharmaceutical carrier to form a 
pharmaceutical composition. 
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